Chat with Image
Engage in intelligent conversations about images using AI vision models. The Chat with Image API allows you to upload images and have natural language discussions about their content, analyze visual elements, extract information, and get detailed descriptions through an interactive chat interface.
Chat with Image Models
The Chat with Image API supports advanced AI models with vision capabilities:
📄️ All Models
Have intelligent conversations about images using various AI vision models including GPT-4o, Gemini, Claude, and more
Key Features
The Chat with Image API offers comprehensive image analysis and conversation capabilities:
- Multi-Modal Conversations: Combine text prompts with image inputs for rich discussions
- Image Analysis: Analyze visual content, objects, scenes, and compositions
- Visual Question Answering: Ask specific questions about image content and get detailed answers
- Context Retention: Maintain conversation context across multiple exchanges
- Multiple Images: Upload and discuss multiple images in a single conversation
- Web Search Integration: Enhanced responses with real-time web search when enabled
- Memory Support: Long-term memory for personalized conversations (when enabled)
Available Models
OpenAI Models:
gpt-4o
- GPT-4o - Advanced vision and reasoning capabilitiesgpt-4o-mini
- GPT-4o Mini - Efficient vision model with good performancegpt-4-turbo
- GPT-4 Turbo - High-quality vision and text understandinggpt-5
- GPT-5 - Latest generation with enhanced visual reasoninggpt-5-mini
- GPT-5 Mini - Efficient next-generation vision modelgpt-5-chat-latest
- GPT-5 Chat Latest - Most recent GPT-5 variant
Google Models:
gemini-1.5-pro
- Gemini 1.5 Pro - Advanced multimodal understandinggemini-1.5-flash
- Gemini 1.5 Flash - Fast and efficient vision processinggemini-2.0-flash
- Gemini 2.0 Flash - Latest generation with improved capabilitiesgemini-2.5-pro
- Gemini 2.5 Pro - Most advanced Google vision model
Anthropic Models:
claude-3-opus-20240229
- Claude 3 Opus - Premium vision and reasoningclaude-3-sonnet
- Claude 3 Sonnet - Balanced performance and capabilityclaude-3-5-sonnet-20240620
- Claude 3.5 Sonnet - Enhanced visual understandingclaude-4-opus
- Claude 4 Opus - Latest Anthropic vision model
Other Models:
mistral-large-latest
- Mistral Large - Advanced reasoning with visionpixtral-12b
- Mistral Pixtral - Specialized vision modelmeta/llama-3.1-405b-instruct
- LLaMA 3.1 405B - Large scale multimodal model
Common Request Structure
All Chat with Image conversations use the same streaming endpoint:
Request Headers
Field | Value |
---|---|
API-KEY | <api-key> |
Content-Type | application/json |
Common Parameters
Field Name | Type | Description | Required |
---|---|---|---|
type | text | Feature type: CHAT_WITH_IMAGE | ✔️ |
model | text | AI model identifier | ✔️ |
conversationId | text | Unique conversation identifier | ✔️ |
promptObject.prompt | string | Your message or question about the image(s) | ✔️ |
promptObject.imageList | array | Array of uploaded image keys | ✔️ |
promptObject.isMixed | boolean | Enable mixed conversation mode | ✖️ |
promptObject.webSearch | boolean | Enable web search for enhanced responses | ✖️ |
promptObject.numOfSite | number | Number of websites to search (if webSearch enabled) | ✖️ |
promptObject.maxWord | number | Maximum words per website (if webSearch enabled) | ✖️ |
Note: Image files must be uploaded first using the Asset API to obtain image keys for the imageList parameter.
Conversation Flow
- Upload Images: First upload your images using the Asset API to get image keys
- Start Conversation: Send initial message with image(s) and your question
- Continue Chat: Send follow-up messages to continue the conversation
- Context Maintained: The AI remembers previous exchanges in the conversation
Use Cases
- Image Description: Get detailed descriptions of image content
- Visual Analysis: Analyze composition, colors, styles, and artistic elements
- Object Recognition: Identify and discuss specific objects in images
- Scene Understanding: Understand contexts, settings, and environments
- Comparative Analysis: Compare multiple images and discuss differences
- Educational Support: Learn about visual concepts and elements
- Creative Feedback: Get feedback on artwork, designs, or photographs
Getting Started
- Upload Your Images: Use the Asset API to upload images and get image keys
- Choose Your Model: Select from available vision-capable models
- Start Conversation: Send your first message with the images
- Interactive Chat: Continue the conversation with follow-up questions
- Test with Playground: Use the interactive playground in model documentation
For detailed implementation examples, parameter specifications, and interactive testing, visit the individual model documentation pages listed above.