OpenClaw imageModel Configuration Guide 2026
π― Key Takeaways (TL;DR)
- imageModel is OpenClaw's dedicated configuration for vision understanding, separate from the main conversation model
- Configure imageModel to enable "fast text models + capable vision models" for optimal speed and capability
- Use CLI commands like
openclaw models set-imageor edit config directly to manage vision models
Table of Contents
- What is imageModel
- Why Separate Configuration
- Configuration Methods
- CLI Management Commands
- Trigger Scenarios
- Fallback Logic
- Relationship with pdfModel
- Built-in Default Image Models
- Complete Configuration Examples
What is imageModel
imageModel is OpenClaw's dedicated model configuration for visual understanding, operating independently from the main conversation model (model). When conversations involve images or visual content, OpenClaw automatically switches to the model specified by imageModel to process the visual input.
This separation allows you to optimize your AI assistant for both speed (using fast text-only models for regular conversations) and capability (using multimodal models when images are involved).
Why Separate Configuration
Your primary model (model.primary) may not support visual input. For example:
- MiniMax-M2.5-highspeed is a text-only model and cannot process images
- moonshot/kimi-k2.5 supports multimodal (text + images)
Configuring imageModel separately enables you to achieve:
π‘ Key Benefit
Text goes through fast models, images go through multimodal models β balancing speed and capability.
This is particularly useful when:
- You want to use cost-effective text models for most conversations
- You need capable vision models only when processing images
- You want to configure fallback chains for vision tasks
Configuration Methods
In your OpenClaw configuration file (edit via openclaw config edit):
{ "agents": { "defaults": { "model": { "primary": "minimax-portal/MiniMax-M2.5-highspeed", "fallbacks": ["moonshot/kimi-k2.5", "anthropic/claude-opus-4-6"] }, "imageModel": { "primary": "moonshot/kimi-k2.5", "fallbacks": ["openrouter/qwen/qwen-2.5-vl-72b-instruct:free"] } } } }
Two Syntax Options
Shorthand (primary model only, no fallback):
"imageModel": "moonshot/kimi-k2.5"
Full syntax (primary + fallback chain):
"imageModel": { "primary": "moonshot/kimi-k2.5", "fallbacks": ["openrouter/google/gemini-2.0-flash-vision:free"] }
Both formats are supported. The full syntax provides redundancy for vision tasks.
CLI Management Commands
OpenClaw provides convenient CLI commands for managing imageModel:
# View current imageModel status openclaw models status # Set imageModel primary model openclaw models set-image moonshot/kimi-k2.5 # Manage imageModel fallback chain openclaw models image-fallbacks list openclaw models image-fallbacks add openrouter/qwen/qwen-2.5-vl-72b-instruct:free openclaw models image-fallbacks remove openrouter/qwen/qwen-2.5-vl-72b-instruct:free openclaw models image-fallbacks clear
These commands make it easy to switch vision models without manually editing config files.
Trigger Scenarios
| Scenario | Description |
|---|---|
| User sends images | Photos, screenshots, or image attachments where the agent needs to "see and describe" |
| User sends PDF | PDFs containing scanned pages/images requiring visual analysis (checks pdfModel first, falls back to imageModel) |
| Media understanding pipeline | Automatic media understanding when images/video frames are received |
| Agent tool calls | When agents use the built-in image tool to analyze images |
β οΈ Important
PDF handling follows this priority:pdfModelβimageModelβ built-in provider default. If no pdfModel is configured, it automatically falls back to imageModel.
Fallback Logic
The fallback chain works as follows:
imageModel.primary β imageModel.fallbacks[0] β fallbacks[1] β ...
OpenClaw tries each model sequentially, returning the first successful response. If all models fail, you'll receive this error:
Error: "No image model configured. Set agents.defaults.imageModel.primary or agents.defaults.imageModel.fallbacks."
Relationship with pdfModel
PDF processing follows this priority:
pdfModel β imageModel β built-in provider default
If you don't configure pdfModel, the PDF tool will automatically fall back to the imageModel configuration. This design ensures consistent vision model handling across different file types.
Built-in Default Image Models
When imageModel is not configured and the system detects the corresponding provider's API key, OpenClaw uses built-in defaults:
| Provider | Default Model |
|---|---|
| OpenAI | gpt-5-mini |
| Anthropic | claude-opus-4-6 |
| gemini-3-flash-preview | |
| MiniMax | MiniMax-VL-01 |
| ZAI | glm-4.6v |
These defaults ensure vision capabilities work out of the box when you have API keys configured.
Complete Configuration Examples
Here's a comprehensive configuration example:
{ "agents": { "defaults": { "model": { "primary": "minimax-portal/MiniMax-M2.5-highspeed", "fallbacks": ["moonshot/kimi-k2.5", "anthropic/claude-opus-4-6"] }, "imageModel": { "primary": "moonshot/kimi-k2.5", "fallbacks": ["openrouter/google/gemini-2.0-flash-vision:free"] }, "pdfModel": { "primary": "anthropic/claude-opus-4-6" }, "models": { "moonshot/kimi-k2.5": { "alias": "kimi" }, "minimax-portal/MiniMax-M2.5-highspeed": { "alias": "mm" } } } } }
Expected Behavior
With this configuration:
- Text conversations β MiniMax-M2.5-highspeed (fast, text-only)
- Sending images β moonshot/kimi-k2.5, fallback to gemini-2.0-flash-vision if it fails
- Sending PDFs β claude-opus-4-6, falls back to imageModel chain if not configured
π€ FAQ
Q: Can I use the same model for both text and images?
A: Yes, if your primary model supports multimodal input (like moonshot/kimi-k2.5 or anthropic/claude-opus-4-6), you can set both model.primary and imageModel.primary to the same value.
Q: What happens if I don't configure imageModel?
A: OpenClaw will use built-in default models based on your configured API providers. However, explicitly configuring imageModel gives you more control over which vision model is used.
Q: How do free vision models work in the fallback chain?
A: Models like openrouter/google/gemini-2.0-flash-vision:free or openrouter/qwen/qwen-2.5-vl-72b-instruct:free are free tier models from OpenRouter. They're useful as fallbacks when your primary vision model fails or is unavailable.
Summary & Recommendations
Configuring imageModel in OpenClaw is essential for:
- Cost optimization β Use fast, cheap text models for regular conversations
- Capability assurance β Ensure vision tasks always have a capable model
- Redundancy β Set up fallback chains to prevent single points of failure
Recommended next steps:
- Check your current configuration with
openclaw models status - Configure a primary imageModel if you haven't already
- Add fallback models for redundancy
- Test with image inputs to verify the configuration works
For more details, visit the OpenClaw Documentation.
Originally published at: OpenClaw imageModel Configuration Guide 2026