Skip to main content

Sora Watermark Remover - Allows you to remove the watermark from Sora videos.Try Now

CurateClick

OpenClaw imageModel Configuration Guide 2026

🎯 Key Takeaways (TL;DR)

  • imageModel is OpenClaw's dedicated configuration for vision understanding, separate from the main conversation model
  • Configure imageModel to enable "fast text models + capable vision models" for optimal speed and capability
  • Use CLI commands like openclaw models set-image or edit config directly to manage vision models

Table of Contents

  1. What is imageModel
  2. Why Separate Configuration
  3. Configuration Methods
  4. CLI Management Commands
  5. Trigger Scenarios
  6. Fallback Logic
  7. Relationship with pdfModel
  8. Built-in Default Image Models
  9. Complete Configuration Examples

What is imageModel

imageModel is OpenClaw's dedicated model configuration for visual understanding, operating independently from the main conversation model (model). When conversations involve images or visual content, OpenClaw automatically switches to the model specified by imageModel to process the visual input.

This separation allows you to optimize your AI assistant for both speed (using fast text-only models for regular conversations) and capability (using multimodal models when images are involved).

Why Separate Configuration

Your primary model (model.primary) may not support visual input. For example:

  • MiniMax-M2.5-highspeed is a text-only model and cannot process images
  • moonshot/kimi-k2.5 supports multimodal (text + images)

Configuring imageModel separately enables you to achieve:

πŸ’‘ Key Benefit
Text goes through fast models, images go through multimodal models β€” balancing speed and capability.

This is particularly useful when:

  • You want to use cost-effective text models for most conversations
  • You need capable vision models only when processing images
  • You want to configure fallback chains for vision tasks

Configuration Methods

In your OpenClaw configuration file (edit via openclaw config edit):

{ "agents": { "defaults": { "model": { "primary": "minimax-portal/MiniMax-M2.5-highspeed", "fallbacks": ["moonshot/kimi-k2.5", "anthropic/claude-opus-4-6"] }, "imageModel": { "primary": "moonshot/kimi-k2.5", "fallbacks": ["openrouter/qwen/qwen-2.5-vl-72b-instruct:free"] } } } }

Two Syntax Options

Shorthand (primary model only, no fallback):

"imageModel": "moonshot/kimi-k2.5"

Full syntax (primary + fallback chain):

"imageModel": { "primary": "moonshot/kimi-k2.5", "fallbacks": ["openrouter/google/gemini-2.0-flash-vision:free"] }

Both formats are supported. The full syntax provides redundancy for vision tasks.

CLI Management Commands

OpenClaw provides convenient CLI commands for managing imageModel:

# View current imageModel status openclaw models status # Set imageModel primary model openclaw models set-image moonshot/kimi-k2.5 # Manage imageModel fallback chain openclaw models image-fallbacks list openclaw models image-fallbacks add openrouter/qwen/qwen-2.5-vl-72b-instruct:free openclaw models image-fallbacks remove openrouter/qwen/qwen-2.5-vl-72b-instruct:free openclaw models image-fallbacks clear

These commands make it easy to switch vision models without manually editing config files.

Trigger Scenarios

ScenarioDescription
User sends imagesPhotos, screenshots, or image attachments where the agent needs to "see and describe"
User sends PDFPDFs containing scanned pages/images requiring visual analysis (checks pdfModel first, falls back to imageModel)
Media understanding pipelineAutomatic media understanding when images/video frames are received
Agent tool callsWhen agents use the built-in image tool to analyze images

⚠️ Important
PDF handling follows this priority: pdfModel β†’ imageModel β†’ built-in provider default. If no pdfModel is configured, it automatically falls back to imageModel.

Fallback Logic

The fallback chain works as follows:

imageModel.primary β†’ imageModel.fallbacks[0] β†’ fallbacks[1] β†’ ...

OpenClaw tries each model sequentially, returning the first successful response. If all models fail, you'll receive this error:

Error: "No image model configured. Set agents.defaults.imageModel.primary or agents.defaults.imageModel.fallbacks."

Relationship with pdfModel

PDF processing follows this priority:

pdfModel β†’ imageModel β†’ built-in provider default

If you don't configure pdfModel, the PDF tool will automatically fall back to the imageModel configuration. This design ensures consistent vision model handling across different file types.

Built-in Default Image Models

When imageModel is not configured and the system detects the corresponding provider's API key, OpenClaw uses built-in defaults:

ProviderDefault Model
OpenAIgpt-5-mini
Anthropicclaude-opus-4-6
Googlegemini-3-flash-preview
MiniMaxMiniMax-VL-01
ZAIglm-4.6v

These defaults ensure vision capabilities work out of the box when you have API keys configured.

Complete Configuration Examples

Here's a comprehensive configuration example:

{ "agents": { "defaults": { "model": { "primary": "minimax-portal/MiniMax-M2.5-highspeed", "fallbacks": ["moonshot/kimi-k2.5", "anthropic/claude-opus-4-6"] }, "imageModel": { "primary": "moonshot/kimi-k2.5", "fallbacks": ["openrouter/google/gemini-2.0-flash-vision:free"] }, "pdfModel": { "primary": "anthropic/claude-opus-4-6" }, "models": { "moonshot/kimi-k2.5": { "alias": "kimi" }, "minimax-portal/MiniMax-M2.5-highspeed": { "alias": "mm" } } } } }

Expected Behavior

With this configuration:

  • Text conversations β†’ MiniMax-M2.5-highspeed (fast, text-only)
  • Sending images β†’ moonshot/kimi-k2.5, fallback to gemini-2.0-flash-vision if it fails
  • Sending PDFs β†’ claude-opus-4-6, falls back to imageModel chain if not configured

πŸ€” FAQ

Q: Can I use the same model for both text and images?

A: Yes, if your primary model supports multimodal input (like moonshot/kimi-k2.5 or anthropic/claude-opus-4-6), you can set both model.primary and imageModel.primary to the same value.

Q: What happens if I don't configure imageModel?

A: OpenClaw will use built-in default models based on your configured API providers. However, explicitly configuring imageModel gives you more control over which vision model is used.

Q: How do free vision models work in the fallback chain?

A: Models like openrouter/google/gemini-2.0-flash-vision:free or openrouter/qwen/qwen-2.5-vl-72b-instruct:free are free tier models from OpenRouter. They're useful as fallbacks when your primary vision model fails or is unavailable.


Summary & Recommendations

Configuring imageModel in OpenClaw is essential for:

  1. Cost optimization β€” Use fast, cheap text models for regular conversations
  2. Capability assurance β€” Ensure vision tasks always have a capable model
  3. Redundancy β€” Set up fallback chains to prevent single points of failure

Recommended next steps:

  • Check your current configuration with openclaw models status
  • Configure a primary imageModel if you haven't already
  • Add fallback models for redundancy
  • Test with image inputs to verify the configuration works

For more details, visit the OpenClaw Documentation.


Originally published at: OpenClaw imageModel Configuration Guide 2026

    OpenClaw imageModel Configuration Guide 2026 - CurateClick