Google's bold Gemma 4 bet targets Meta's hold on developers

Google’s bold Gemma 4 bet targets Meta’s hold on developers

April 2, 2026

Google DeepMind launched its most capable open-weight model yet, and it runs powerful AI agents directly on smartphones, laptops, and even a Raspberry Pi.

Google DeepMind launched Gemma 4 on Wednesday, releasing what it describes as the most capable open-weight AI model family it has ever built. Available immediately under an Apache 2.0 license, the release is aimed squarely at developers who want to build sophisticated AI applications without routing every request through a cloud server or paying per-token fees to a closed platform.

The timing is deliberate. Meta’s Llama 4 and Mistral have been the go-to options for developers working in the open-source AI space, and Google has spent years trying to close that gap. Gemma 4 is the company’s most direct attempt yet to change that dynamic, with a specific focus on agentic workflows, the kind of multi-step autonomous reasoning that has become a priority for enterprises building AI into their operations.

What Gemma 4 actually does

The model family comes in 4 sizes. The first two are the Effective 2B and Effective 4B models, smaller builds designed to run on edge devices like Android smartphones and Raspberry Pi computers. The third is a 26B Mixture of Experts model that activates only 3.8 billion parameters during inference, allowing it to run at high speed without losing the depth of a much larger model. The fourth is a 31B Dense model, which currently ranks third among open models on the industry-standard open LLM leaderboard.

All four models support native function calling and structured JSON outputs, meaning developers no longer need to retrofit their applications to get the models to interact with outside tools. Earlier versions of Gemma required that kind of extra work. Gemma 4 removes it.

Every model in the family can process images and videos. The two smaller models go further with native audio input support, enabling real-time speech understanding directly on device. Context windows have also been expanded, reaching 128K tokens for the smaller models and 256K for the two larger ones. That means a developer can feed an entire codebase or a large document library into a single prompt without hitting a ceiling.

Running AI without an internet connection

One of the more significant aspects of the Gemma 4 launch is what it means for offline AI. Google’s LiteRT-LM runtime allows the Effective 2B model to run using under 1.5 gigabytes of memory on supported devices. On a Raspberry Pi 5, it processes 4,000 input tokens across two distinct tasks in under 3 seconds. That kind of performance on constrained hardware opens up use cases in smart home systems, voice assistants, and robotics that previously required a cloud connection to function.

For mobile and desktop developers, Gemma 4 runs on Android and iOS with CPU and GPU support, and on Windows, Linux, and macOS. Browser-based execution through WebGPU is also supported. Google is also launching a new Python package and command-line tool that lets developers experiment with Gemma 4 without writing any code at all.

Where Google stands in the open AI race

Google DeepMind researchers described the Gemma 4 family as delivering more intelligence per parameter than any previous release, and the 31B Dense model’s third-place ranking on open model benchmarks gives that claim some weight. The Apache 2.0 license removes many of the commercial restrictions that have made other open models less attractive for enterprise use, which analysts say could be a meaningful differentiator.

Constellation Research analyst Holger Mueller noted that Google is building its AI lead on two fronts simultaneously, pushing Gemini for its own ecosystem while using Gemma to win over independent developers. He described Gemma 3 as setting a high bar and said expectations for this release were correspondingly elevated.

The competitive picture is busy. Anthropic released Claude 3.5 Opus with enhanced reasoning capabilities around the same time, and OpenAI continues to develop its o1 reasoning line. Meta has the distribution advantage through its extensive Llama partnerships. Whether Gemma 4 moves developers away from those entrenched options will become clearer over the coming weeks as production testing begins.

Gemma 4 is available now on Google Cloud, Hugging Face, Kaggle, and Ollama.

Search

RECENT PRESS RELEASES