Google upgrades Gemini 1.5 Pro and is about to launch new Gemini 1.5 Flash models and Gemma 2 | by Brain Titan | May, 2024

Google upgrades Gemini 1.5 Pro and is about to launch new Gemini 1.5 Flash models and Gemma 2 | by Brain Titan | May, 2024

Google released a number of updates at I/O, including improvements to Gemini 1.5 Pro, new Gemini 1.5 Flash models, new models for the Gemma family, and new features and pricing options for the Gemini API.

Gemini 1.5 Pro Improvements and 1.5 Flash Model

Gemini 1.5 Pro

  • Quality Enhancements: for translation, programming, reasoning, and other key use cases, quality improvements have been made to enable it to handle a wider range and complexity of tasks.
  • 1 Million Tagged Context Window: Supports input in long contexts, allowing more information to be processed in a single request.
  • Multi-modal support: Ability to handle multiple forms of input such as text, images, audio, and video.
  • 2 Million Markup Context Window: available in private preview, with users able to gain access via Google AI Studio or Vertex AI Join Waitlist.

Gemini 1.5 Flash

  • Introduction: Gemini 1.5 Flash is a lightweight model designed to optimize speed and efficiency. It is suitable for high-frequency, high-volume tasks and is the fastest Gemini model in the API.
  • Features:
  • With a breakthrough long context window (1 million tokens) for multimodal reasoning.
  • Extracts the most important knowledge and skills from 1.5 Pro through a process of ‘distillation’ into smaller, more efficient models.
  • Applications: Abstract generation, chat applications, image and video captioning, data extraction for long documents and tables, and more.
  • Optimized Response Time: for high-frequency tasks that require fast responses.
  • 1 Million Marker Context Window: Same as 1.5 Pro, supports input in long contexts.
  • Multi-modal support: The same support for multiple input formats including text, images, audio, and video.
  • Worldwide availability: Both models are available in preview in over 200 countries and will be officially launched in June.

Gemini Nano

  • Multimodal Input: Gemini Nano now supports not only text input, but also image input. This means that models can make sense of the world through text, images, sounds and spoken words.
  • Platform support: Launching first on Pixel devices, it leverages the multimodal capabilities of the device to enhance the user experience.

Here’s more about the new features and improvements of Google AI on Android.

Circle to Search and Homework Help

  • Feature: Circle to Search helps users search for anything on their phone without switching apps through simple gestures.
  • New Capabilities: It now helps students with homework problems, providing step-by-step instructions for solving physics and math problems. This will be expanded in the future to include more complex problems, including symbolic formulas, diagrams, and more.
  • Status: Circle to Search is available on more than 100 million devices, with plans to double coverage by the end of the year.

Gemini on Android Update

  • Features: Gemini is a new AI assistant that utilizes generative AI to help users boost creativity and productivity.
  • Improvements: Enhanced understanding of screen content and app context. Users can invoke Gemini directly while using the app, such as dragging and dropping a generated image to Gmail or Google Messages, or finding specific information in a YouTube video.
  • Advanced Features: Gemini Advanced allows users to quickly find answers in PDFs without scrolling through multiple pages.

Gemini Nano’s Full Multimodal Capabilities

  • Functionality: Android was the first mobile operating system to have a built-in device base model. Nano’s multimodal capabilities are coming soon, starting with Pixel devices.
  • Capabilities: Not only can it process text input, but it can also understand contextual information such as images, sounds and spoken words.

Clearer descriptions for TalkBack

  • Update: The Gemini Nano’s multimodal capabilities will help visually impaired users get clearer descriptions of images, whether it’s a family photo or details of clothing purchased online.
  • Benefits: These descriptions are fast and do not require an Internet connection.

Telephone Fraud Detect Alerts

  • Functionality: Detect common fraudulent conversation patterns during a call with the Gemini Nano, sending real-time alerts such as a bank representative requesting an emergency transfer or personal information.
  • Privacy: This feature is handled on the device to ensure conversation privacy.

New Developer Features and Pricing Options

New Developer Features

  • Video Frame Extraction: Allows frames to be extracted from video for further analysis and processing.
  • Parallel Function Calls: Supports returning the results of multiple function calls at the same time to improve processing efficiency.
  • Context caching: Starting in June, developers can send large files or long cues to the model only once, improving the efficiency and economy of using long contexts.

Pricing options

  • Free access: Gemini API access is available for free through Google AI Studio in eligible regions.
  • Pay-as-you-go: Introduced a new pay-as-you-go service that supports higher rate limits, giving developers the flexibility to use it as needed.
  • Pricing details

Additional models in the Gemma family

PaliGemma

  • Visual Language Open Model: optimized for image caption generation, visual quizzing, and other image annotation tasks.
  • Pre-trained variants: Pre-trained Gemma variants such as CodeGemma and RecurrentGemma have been added to provide more options.

Gemma 2

  • Next-Generation Gemma Models: Designed with a new architecture that delivers groundbreaking performance and efficiency, the 2.7-billion-parameter Gemma 2 rivals the Llama 3 70B in performance.
  • Efficient operation: The ability to run efficiently on NVIDIA’s GPUs or a single TPU host enables more users to deploy at a lower cost for developers and researchers.
  • June Launch: This will be an official release in June, addressing developer demand for larger and easier-to-use models.

Details:

Originally Appeared Here