On April 2, 2026, alongside Gemma 4, Google released AI Edge Gallery—the premier destination for running frontier open-source Large Language Models on your mobile device. The iOS app was released in February 2026, expanding from Android.
All model inferences happen directly on your device hardware. No internet is required, ensuring total privacy for your prompts, images, and sensitive data.
Your phone is no longer just an AI terminal connecting to cloud services—it's becoming an autonomous AI device.
What is AI Edge Gallery?
AI Edge Gallery is Google's experimental app (released March 2025, 500K+ downloads) that lets you run open-source LLMs on mobile devices. The April 2026 update brought official Gemma 4 support, making it the most capable on-device AI experience available.
Core Features
| Feature | Description |
|---|---|
| Agent Skills | Extend LLMs with tools: Wikipedia, maps, web search, custom skills |
| Thinking Mode | Visualize the model's reasoning process |
| Prompt Lab | Test prompts with granular control over temperature, top-k |
| Model Management | Download, benchmark, and manage models locally |
| 100% Offline | All inference on-device, no internet required |
Supported Models (April 2026)
- Gemma 4 family (E2B, E4B, 26B A4B, 31B)
- FunctionGemma variants
- Community models via Hugging Face integration
Why On-Device AI Matters
The Privacy Case
Every cloud AI interaction involves data leaving your device. With AI Edge Gallery:
- Zero data transmission: Your prompts never leave your phone
- Complete offline: Works in airplane mode, no signal? No problem
- Sensitive data: Analyze documents, contracts, medical info—all local
The Latency Case
Cloud AI: Input → Upload → Process → Download → Output On-Device: Input → Process → Output
The Cost Case
Cloud API calls add up. On-device = one-time model download, then free forever.
Agent Skills: Extending the LLM
The Agent Skills system transforms conversational LLMs into capable agents:
Built-in Skills
- Wikipedia: Fact-grounded responses - Interactive Maps: Location-aware AI - Web Search: Real-time information - Custom Skills: Load from URL or community repos
How Skills Work
When you enable a skill, the LLM gains function-calling capabilities automatically:
User: "What's the population of Tokyo?" Agent Skill activates: 1. Recognizes question requires factual data 2. Calls Wikipedia skill 3. Retrieves current population 4. Formats response with citation
Thinking Mode: Seeing Inside the Model
One of the most compelling features is Thinking Mode—tap the toggle to watch the model reason in real-time.
This shows:
- How the model breaks down the problem
- Intermediate reasoning steps
- Confidence adjustments
Performance and Hardware
What You Need
| Model | Recommended Device | Performance |
|---|---|---|
| E2B | Any modern phone | ~30 tokens/sec |
| E4B | iPhone 15+, Pixel 7+ | ~20 tokens/sec |
| 26B A4B | iPhone 15 Pro, high-end Android | ~8 tokens/sec |
| 31B | Not recommended | Too demanding |
Use Cases That Work
1. Personal AI Assistant
Keep a lightweight model (E2B/E4B) always ready:
- Quick Q&A without internet
- Draft emails, review documents
2. Developer Sandbox
- Test prompts and model behaviors in isolation
- Prototype before cloud deployment
3. Privacy-First workflows
- Legal document review
- Medical record summarization
- Financial analysis
4. Offline Development
- Airplane coding sessions
- Security-sensitive environments
The Developer Opportunity
AI Edge Gallery isn't just an app—it's a proving ground for on-device AI development.
Building Custom Skills
from skill_runtime import skill @skill("analyze_image") def analyze_image(image_path): # Custom ML pipeline return {"objects": [...], "description": ...}
AI Edge Gallery is available on Google Play and iOS App Store (iOS 17+).