News from Google – Gemma 2 is now available!
Thank you for reading this post, don't forget to subscribe!Introducing Gemma 2: Advanced AI for Everyone
Expanding Access to AI
AI has the potential to solve some of humanity’s most pressing issues, but this can only happen if everyone has the tools to build with it. Earlier this year, Google introduced Gemma, a family of lightweight, state-of-the-art open models based on the same research and technology used to create the Gemini models. They’ve since expanded the Gemma family with CodeGemma, RecurrentGemma, and PaliGemma, each offering unique capabilities for various AI tasks. These models are easily accessible through integrations with partners like Hugging Face, NVIDIA, and Ollama.
Launching Gemma 2
Google is now officially releasing Gemma 2 to researchers and developers worldwide. Available in both 9 billion (9B) and 27 billion (27B) parameter sizes, Gemma 2 offers higher performance and greater efficiency than its predecessor, along with significant safety enhancements. The 27B model provides competitive alternatives to models more than twice its size, achieving performance levels that were only possible with proprietary models as recently as last December. This performance is now achievable on a single NVIDIA H100 Tensor Core GPU or TPU host, significantly reducing deployment costs.
Setting a New Standard for Efficiency and Performance
Gemma 2 is built on a redesigned architecture, engineered for exceptional performance and inference efficiency. Here’s what sets it apart:
- Outstanding Performance: The 27B Gemma 2 delivers the best performance in its size class, even rivaling models more than twice its size. The 9B model also leads its category, outperforming models like Llama 3 8B.
- Efficiency and Cost Savings: The 27B model runs efficiently at full precision on a single Google Cloud TPU host, NVIDIA A100 80GB Tensor Core GPU, or NVIDIA H100 Tensor Core GPU, reducing costs while maintaining high performance. This makes AI deployments more accessible and budget-friendly.
- Fast Inference Across Hardware: Gemma 2 is optimized for speed across various hardware setups, from powerful gaming laptops and high-end desktops to cloud-based environments. You can try Gemma 2 at full precision in Google AI Studio, use the quantized version with Gemma.cpp on your CPU, or run it on your home computer with an NVIDIA RTX or GeForce RTX via Hugging Face Transformers.
Designed for Developers and Researchers
Gemma 2 is not only more powerful but also easier to integrate into your workflows:
- Open and Accessible: Gemma 2 is available under a commercially-friendly Gemma license, allowing developers and researchers to share and commercialize their innovations.
- Broad Framework Compatibility: Use Gemma 2 with major AI frameworks like Hugging Face Transformers, JAX, PyTorch, and TensorFlow via native Keras 3.0, vLLM, Gemma.cpp, Llama.cpp, and Ollama. Optimized with NVIDIA TensorRT-LLM, it can run on NVIDIA-accelerated infrastructure or as an NVIDIA NIM inference microservice. Fine-tuning is possible today with Keras and Hugging Face, with additional parameter-efficient fine-tuning options in development.
- Effortless Deployment: Starting next month, Google Cloud customers can easily deploy and manage Gemma 2 on Vertex AI.
Supporting Responsible AI Development
Google is committed to providing resources for responsible AI development, including their Responsible Generative AI Toolkit. The recently open-sourced LLM Comparator helps with in-depth evaluation of language models. You can now use its companion Python library to run comparative evaluations and visualize the results. Additionally, we are working on open-sourcing our text watermarking technology, SynthID, for Gemma models.
When training Gemma 2, Google followed rigorous safety processes, filtering pre-training data and performing extensive testing and evaluation to identify and mitigate potential biases and risks. They publish their results on public benchmarks related to safety and representational harms.
Projects Built with Gemma
The first Gemma launch led to over 10 million downloads and numerous inspiring projects. For instance, Navarasa used Gemma to create a model rooted in India’s linguistic diversity.
Looking Ahead
Gemma 2 will enable even more ambitious projects, unlocking new levels of performance and potential in AI creations. We will continue to explore new architectures and develop specialized Gemma variants for a broader range of AI tasks and challenges, including an upcoming 2.6B parameter model designed to bridge the gap between lightweight accessibility and powerful performance.
Getting Started
Gemma 2 is now available in Google AI Studio, allowing you to test its full performance capabilities at 27B without hardware requirements. You can also download Gemma 2’s model weights from Kaggle and Hugging Face Models, with Vertex AI Model Garden coming soon.
To support research and development, Gemma 2 is acessable free of charge through Kaggle or a free tier for Colab notebooks. First-time Google Cloud customers may be eligible for $300 in credits. Academic researchers can apply for the Gemma 2 Academic Research Program to receive Google Cloud credits to accelerate their research with Gemma 2. Applications are open now through August 9.