Skip to main content

Chat with Image

Engage in intelligent conversations about images using AI vision models. The Chat with Image API allows you to upload images and have natural language discussions about their content, analyze visual elements, extract information, and get detailed descriptions through an interactive chat interface.

Chat with Image Models

The Chat with Image API supports advanced AI models with vision capabilities:

Key Features

The Chat with Image API offers comprehensive image analysis and conversation capabilities:

  • Multi-Modal Conversations: Combine text prompts with image inputs for rich discussions
  • Image Analysis: Analyze visual content, objects, scenes, and compositions
  • Visual Question Answering: Ask specific questions about image content and get detailed answers
  • Context Retention: Maintain conversation context across multiple exchanges
  • Multiple Images: Upload and discuss multiple images in a single conversation
  • Web Search Integration: Enhanced responses with real-time web search when enabled
  • Memory Support: Long-term memory for personalized conversations (when enabled)

Available Models

OpenAI Models:

  • gpt-4o - GPT-4o - Advanced vision and reasoning capabilities
  • gpt-4o-mini - GPT-4o Mini - Efficient vision model with good performance
  • gpt-4-turbo - GPT-4 Turbo - High-quality vision and text understanding
  • gpt-5 - GPT-5 - Latest generation with enhanced visual reasoning
  • gpt-5-mini - GPT-5 Mini - Efficient next-generation vision model
  • gpt-5-chat-latest - GPT-5 Chat Latest - Most recent GPT-5 variant

Google Models:

  • gemini-1.5-pro - Gemini 1.5 Pro - Advanced multimodal understanding
  • gemini-1.5-flash - Gemini 1.5 Flash - Fast and efficient vision processing
  • gemini-2.0-flash - Gemini 2.0 Flash - Latest generation with improved capabilities
  • gemini-2.5-pro - Gemini 2.5 Pro - Most advanced Google vision model

Anthropic Models:

  • claude-3-opus-20240229 - Claude 3 Opus - Premium vision and reasoning
  • claude-3-sonnet - Claude 3 Sonnet - Balanced performance and capability
  • claude-3-5-sonnet-20240620 - Claude 3.5 Sonnet - Enhanced visual understanding
  • claude-4-opus - Claude 4 Opus - Latest Anthropic vision model

Other Models:

  • mistral-large-latest - Mistral Large - Advanced reasoning with vision
  • pixtral-12b - Mistral Pixtral - Specialized vision model
  • meta/llama-3.1-405b-instruct - LLaMA 3.1 405B - Large scale multimodal model

Common Request Structure

All Chat with Image conversations use the same streaming endpoint:

Request Headers

FieldValue
API-KEY<api-key>
Content-Typeapplication/json

Common Parameters

Field NameTypeDescriptionRequired
typetextFeature type: CHAT_WITH_IMAGE✔️
modeltextAI model identifier✔️
conversationIdtextUnique conversation identifier✔️
promptObject.promptstringYour message or question about the image(s)✔️
promptObject.imageListarrayArray of uploaded image keys✔️
promptObject.isMixedbooleanEnable mixed conversation mode✖️
promptObject.webSearchbooleanEnable web search for enhanced responses✖️
promptObject.numOfSitenumberNumber of websites to search (if webSearch enabled)✖️
promptObject.maxWordnumberMaximum words per website (if webSearch enabled)✖️

Note: Image files must be uploaded first using the Asset API to obtain image keys for the imageList parameter.

Conversation Flow

  1. Upload Images: First upload your images using the Asset API to get image keys
  2. Start Conversation: Send initial message with image(s) and your question
  3. Continue Chat: Send follow-up messages to continue the conversation
  4. Context Maintained: The AI remembers previous exchanges in the conversation

Use Cases

  • Image Description: Get detailed descriptions of image content
  • Visual Analysis: Analyze composition, colors, styles, and artistic elements
  • Object Recognition: Identify and discuss specific objects in images
  • Scene Understanding: Understand contexts, settings, and environments
  • Comparative Analysis: Compare multiple images and discuss differences
  • Educational Support: Learn about visual concepts and elements
  • Creative Feedback: Get feedback on artwork, designs, or photographs

Getting Started

  1. Upload Your Images: Use the Asset API to upload images and get image keys
  2. Choose Your Model: Select from available vision-capable models
  3. Start Conversation: Send your first message with the images
  4. Interactive Chat: Continue the conversation with follow-up questions
  5. Test with Playground: Use the interactive playground in model documentation

For detailed implementation examples, parameter specifications, and interactive testing, visit the individual model documentation pages listed above.