Gemma 4 Free

Chat with Google's most capable open AI model for free. Multimodal, 256K context, 140+ languages. No login required.

Online · Gemma 4 31B
Ask Gemma 4 anything...

Gemma 4 Free Models

Mobile / Embedded

Gemma 4 E2B

2.3B effective / 5.1B total

Ultra-lightweight with Per-Layer Embeddings (PLE). Runs on Raspberry Pi, smartphones, and IoT devices. Native audio input for voice applications.

Context128K
ModalitiesText, Image, Audio
Hardware8GB RAM
Best forVoice assistants, mobile apps, IoT
Laptop / Desktop

Gemma 4 E4B

4.5B effective / 8B total

Best balance of quality and efficiency for on-device AI. Full multimodal with native audio. Runs on consumer GPUs and Apple Silicon Macs.

Context128K
ModalitiesText, Image, Audio
Hardware6GB VRAM
Best forLocal chatbots, code assistants, offline AI
Best Value

Gemma 4 27B MoE

3.8B active / 26B total

Mixture-of-Experts: 128 experts, only 8 active per token. Achieves near-flagship performance at a fraction of compute cost. #6 on LMArena with just 3.8B active parameters.

#6 (ELO 1441)
Context256K
ModalitiesText, Image
Hardware~15GB VRAM (Q4)
Best forProduction chatbots, API serving, cost-sensitive inference
Flagship

Gemma 4 31B

31B dense

Maximum quality dense model. #3 on LMArena globally. 89.2% AIME 2026 math, 80% LiveCodeBench coding, 94.1% HumanEval. Best foundation for fine-tuning and enterprise deployment.

#3 (ELO 1452)
Context256K
ModalitiesText, Image
Hardware~20GB VRAM (Q4)
Best forEnterprise AI, fine-tuning, research, complex reasoning

What Gemma 4 Can Do

Text

256K tokens context
  • Process entire codebases and long documents in a single prompt
  • 140+ languages with strong multilingual performance
  • Built-in reasoning (thinking mode) for complex math and logic
  • Native function calling and structured JSON output for agentic workflows
  • Codeforces ELO 2150 — above 98% of human competitive programmers

Vision

Image & video understanding
  • Configurable image token budgets (70–1120 tokens) for speed-quality tradeoff
  • OCR, chart interpretation, diagram understanding, visual reasoning
  • Variable aspect ratio with ViT encoder (16×16 patches, 2D RoPE)
  • 76.9% MMMU Pro, 85.6% MATH-Vision on 31B model
  • Process video as multi-frame sequences

Audio

E2B & E4B models
  • Native speech recognition via USM-style conformer encoder (~300M params)
  • No separate ASR pipeline needed — direct audio-to-text understanding
  • Ideal for on-device voice assistants and real-time transcription
  • Available on edge models (E2B, E4B) for mobile deployment

Gemma 4 Benchmarks

MMLU Pro

Gemma 4 31B
85.2
Qwen 3.5 27B
86.1
Llama 4 Scout
81
GPT-4o
83.5

AIME 2026

Gemma 4 31B
89.2
Qwen 3.5 27B
85
Llama 4 Scout
72
GPT-4o
76

LiveCodeBench v6

Gemma 4 31B
80
Qwen 3.5 27B
72
Llama 4 Scout
68
GPT-4o
70

GPQA Diamond

Gemma 4 31B
84.3
Qwen 3.5 27B
85.5
Llama 4 Scout
74
GPT-4o
78

Gemma 4 vs Qwen 3.5 Comparison

Gemma 4 31BQwen 3.5 27B
LMArena Rank#3 (1452)#4 (1448)
AIME 2026 (Math)89.2%~85%
LiveCodeBench (Code)80.0%72.0%
MMLU Pro (Knowledge)85.2%86.1%
GPQA Diamond (Science)84.3%85.5%
MultimodalText + Image + VideoText + Image
Audio InputYes (E2B/E4B)No
Context Window256K128K
Languages140+29
LicenseApache 2.0Apache 2.0
Edge ModelsE2B (2B), E4B (4B)Qwen3 0.6B/1.7B/4B
MoE Variant27B (3.8B active)No

Why Gemma 4

Multimodal Native

Every model understands text and images. Edge models add native audio. No separate pipelines needed.

256K Context Window

Process entire codebases, research papers, or hours of conversation history in a single prompt.

140+ Languages

Broad multilingual support including CJK, Arabic, Hindi, and 130+ more languages.

Apache 2.0 License

Fully permissive. No user count limits, no commercial restrictions. Unlike Llama's 700M MAU cap.

On-Device Ready

E2B runs on 8GB RAM devices. Quantized 31B fits on a single RTX 4090. NVIDIA, AMD, and Apple Silicon supported.

Built-in Reasoning

Thinking mode for step-by-step reasoning. AIME jumped from 20.8% (Gemma 3) to 89.2% (Gemma 4).

Agentic Workflows

Native function calling, structured JSON output, and multi-step planning for tool-use agents.

Fine-tuning Ready

LoRA, QLoRA, full SFT supported. Works with HuggingFace PEFT, Keras, Unsloth, and NVIDIA NeMo.

Gemma 4 Use Cases

Code Generation

94.1% HumanEval, 2150 Codeforces ELO. Generates, reviews, and debugs code. Outperforms GPT-4o on coding benchmarks.

RAG & Document Q&A

256K context + function calling. Ingest entire PDFs, codebases, or knowledge bases. Structured JSON output for pipelines.

Multimodal Chat

Understand images, charts, diagrams, and video frames. Describe photos, extract data from screenshots, analyze documents.

On-Device AI

E2B and E4B with native audio run on phones, Raspberry Pi, and Jetson. Offline-capable with <1.5GB RAM footprint.

Enterprise Deployment

Vertex AI managed serving, sovereign cloud ready. Apache 2.0 means no license headaches. 15+ frameworks supported day one.

Research & Fine-tuning

Base models for custom training. Used by Yale (cancer research), INSAIT (BgGPT for Bulgarian), and 100K+ community variants.

Gemma 4 Architecture

Per-Layer Embeddings (PLE)Each decoder layer gets its own small embedding table. E2B has 5.1B total params but only 2.3B effective compute — more capacity without more inference cost.
Mixture of Experts (MoE)27B model has 128 experts with 8 active per token. Only 3.8B parameters activate per forward pass, delivering near-31B quality at ~4B compute.
Hybrid AttentionInterleaves local sliding-window (512/1024 tokens) with global full-context attention. Fast processing + deep long-range understanding.
Shared KV CacheKeys=Values constraint in global attention layers halves KV cache memory. Enables longer context on less hardware.
Dual RoPECombines standard RoPE with proportional extension (p=0.25). Better position encoding for long sequences up to 256K tokens.
Vision EncoderViT with 16×16 patches and 2D RoPE. Configurable token budgets (70–1120) let you trade image detail for speed.

Get Started with Gemma 4 Free

# Install and run the flagship model
ollama run gemma4:31b

# Or the efficient MoE variant
ollama run gemma4:27b

# Edge model for lightweight devices
ollama run gemma4:4b

Gemma 4 FAQ

TRY GEMMA 4 FREE