OpenClaw imageModel Configuration Guide 2026

🎯 Key Takeaways (TL;DR)

imageModel is OpenClaw's dedicated configuration for vision understanding, separate from the main conversation model
Configure imageModel to enable "fast text models + capable vision models" for optimal speed and capability
Use CLI commands like openclaw models set-image or edit config directly to manage vision models

What is imageModel
Why Separate Configuration
Configuration Methods
CLI Management Commands
Trigger Scenarios
Fallback Logic
Relationship with pdfModel
Built-in Default Image Models
Complete Configuration Examples

What is imageModel

imageModel is OpenClaw's dedicated model configuration for visual understanding, operating independently from the main conversation model (model). When conversations involve images or visual content, OpenClaw automatically switches to the model specified by imageModel to process the visual input.

This separation allows you to optimize your AI assistant for both speed (using fast text-only models for regular conversations) and capability (using multimodal models when images are involved).

Why Separate Configuration

Your primary model (model.primary) may not support visual input. For example:

MiniMax-M2.5-highspeed is a text-only model and cannot process images
moonshot/kimi-k2.5 supports multimodal (text + images)

Configuring imageModel separately enables you to achieve:

💡 Key Benefit
Text goes through fast models, images go through multimodal models — balancing speed and capability.

This is particularly useful when:

You want to use cost-effective text models for most conversations
You need capable vision models only when processing images
You want to configure fallback chains for vision tasks

Configuration Methods

In your OpenClaw configuration file (edit via openclaw config edit):

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "minimax-portal/MiniMax-M2.5-highspeed",
        "fallbacks": ["moonshot/kimi-k2.5", "anthropic/claude-opus-4-6"]
      },
      "imageModel": {
        "primary": "moonshot/kimi-k2.5",
        "fallbacks": ["openrouter/qwen/qwen-2.5-vl-72b-instruct:free"]
      }
    }
  }
}

Two Syntax Options

Shorthand (primary model only, no fallback):

"imageModel": "moonshot/kimi-k2.5"

Full syntax (primary + fallback chain):

"imageModel": {
  "primary": "moonshot/kimi-k2.5",
  "fallbacks": ["openrouter/google/gemini-2.0-flash-vision:free"]
}

Both formats are supported. The full syntax provides redundancy for vision tasks.

CLI Management Commands

OpenClaw provides convenient CLI commands for managing imageModel:

# View current imageModel status
openclaw models status

# Set imageModel primary model
openclaw models set-image moonshot/kimi-k2.5

# Manage imageModel fallback chain
openclaw models image-fallbacks list
openclaw models image-fallbacks add openrouter/qwen/qwen-2.5-vl-72b-instruct:free
openclaw models image-fallbacks remove openrouter/qwen/qwen-2.5-vl-72b-instruct:free
openclaw models image-fallbacks clear

These commands make it easy to switch vision models without manually editing config files.

Trigger Scenarios

Scenario	Description
User sends images	Photos, screenshots, or image attachments where the agent needs to "see and describe"
User sends PDF	PDFs containing scanned pages/images requiring visual analysis (checks pdfModel first, falls back to imageModel)
Media understanding pipeline	Automatic media understanding when images/video frames are received
Agent tool calls	When agents use the built-in `image` tool to analyze images

⚠️ Important
PDF handling follows this priority: pdfModel → imageModel → built-in provider default. If no pdfModel is configured, it automatically falls back to imageModel.

Fallback Logic

The fallback chain works as follows:

imageModel.primary → imageModel.fallbacks[0] → fallbacks[1] → ...

OpenClaw tries each model sequentially, returning the first successful response. If all models fail, you'll receive this error:

Error: "No image model configured. Set agents.defaults.imageModel.primary or agents.defaults.imageModel.fallbacks."

Relationship with pdfModel

PDF processing follows this priority:

pdfModel → imageModel → built-in provider default

If you don't configure pdfModel, the PDF tool will automatically fall back to the imageModel configuration. This design ensures consistent vision model handling across different file types.

Built-in Default Image Models

When imageModel is not configured and the system detects the corresponding provider's API key, OpenClaw uses built-in defaults:

Provider	Default Model
OpenAI	gpt-5-mini
Anthropic	claude-opus-4-6
Google	gemini-3-flash-preview
MiniMax	MiniMax-VL-01
ZAI	glm-4.6v

These defaults ensure vision capabilities work out of the box when you have API keys configured.

Complete Configuration Examples

Here's a comprehensive configuration example:

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "minimax-portal/MiniMax-M2.5-highspeed",
        "fallbacks": ["moonshot/kimi-k2.5", "anthropic/claude-opus-4-6"]
      },
      "imageModel": {
        "primary": "moonshot/kimi-k2.5",
        "fallbacks": ["openrouter/google/gemini-2.0-flash-vision:free"]
      },
      "pdfModel": {
        "primary": "anthropic/claude-opus-4-6"
      },
      "models": {
        "moonshot/kimi-k2.5": {
          "alias": "kimi"
        },
        "minimax-portal/MiniMax-M2.5-highspeed": {
          "alias": "mm"
        }
      }
    }
  }
}

Expected Behavior

With this configuration:

Text conversations → MiniMax-M2.5-highspeed (fast, text-only)
Sending images → moonshot/kimi-k2.5, fallback to gemini-2.0-flash-vision if it fails
Sending PDFs → claude-opus-4-6, falls back to imageModel chain if not configured

🤔 FAQ

Q: Can I use the same model for both text and images?

A: Yes, if your primary model supports multimodal input (like moonshot/kimi-k2.5 or anthropic/claude-opus-4-6), you can set both model.primary and imageModel.primary to the same value.

Q: What happens if I don't configure imageModel?

A: OpenClaw will use built-in default models based on your configured API providers. However, explicitly configuring imageModel gives you more control over which vision model is used.

Q: How do free vision models work in the fallback chain?

A: Models like openrouter/google/gemini-2.0-flash-vision:free or openrouter/qwen/qwen-2.5-vl-72b-instruct:free are free tier models from OpenRouter. They're useful as fallbacks when your primary vision model fails or is unavailable.

Summary & Recommendations

Configuring imageModel in OpenClaw is essential for:

Cost optimization — Use fast, cheap text models for regular conversations
Capability assurance — Ensure vision tasks always have a capable model
Redundancy — Set up fallback chains to prevent single points of failure

Recommended next steps:

Check your current configuration with openclaw models status
Configure a primary imageModel if you haven't already
Add fallback models for redundancy
Test with image inputs to verify the configuration works

For more details, visit the OpenClaw Documentation.

Originally published at: OpenClaw imageModel Configuration Guide 2026

CurateClick