Skip to main content

Chat with Image Models

Have intelligent conversations about images using advanced AI vision models. Upload images and engage in natural language discussions about their content, analyze visual elements, extract information, and get detailed descriptions through an interactive chat interface.

Available Models

The Chat with Image API supports multiple AI models with vision capabilities:

OpenAI Models:

  • gpt-4o - GPT-4o (128K context, $0.25 input / $10 output per 1K tokens)
  • gpt-4o-mini - GPT-4o Mini (128K context, $0.15 input / $0.6 output per 1K tokens)
  • gpt-4-turbo - GPT-4 Turbo (128K context, $10 input / $30 output per 1K tokens)
  • gpt-5 - GPT-5 (400K context, $1.25 input / $10 output per 1K tokens)
  • gpt-5-mini - GPT-5 Mini (400K context, $0.25 input / $2 output per 1K tokens)
  • gpt-5-chat-latest - GPT-5 Chat Latest (128K context, $1.25 input / $10 output per 1K tokens)

Google Models:

  • gemini-1.5-pro - Gemini 1.5 Pro (2M context, $1.25 input / $5 output per 1K tokens)
  • gemini-1.5-flash - Gemini 1.5 Flash (1M context, $0.075 input / $0.3 output per 1K tokens)
  • gemini-2.0-flash - Gemini 2.0 Flash (1M context, $0.075 input / $0.3 output per 1K tokens)
  • gemini-2.5-pro - Gemini 2.5 Pro (2M context, $1.25 input / $5 output per 1K tokens)

Anthropic Models:

  • claude-3-opus-20240229 - Claude 3 Opus (200K context, $15 input / $75 output per 1K tokens)
  • claude-3-sonnet - Claude 3 Sonnet (200K context, $3 input / $15 output per 1K tokens)
  • claude-3-5-sonnet-20240620 - Claude 3.5 Sonnet (200K context, $3 input / $15 output per 1K tokens)
  • claude-4-opus - Claude 4 Opus (1M context, $15 input / $75 output per 1K tokens)

Other Models:

  • mistral-large-latest - Mistral Large 2 (128K context, $2 input / $6 output per 1K tokens)
  • pixtral-12b - Mistral Pixtral 12B (128K context, $0.15 input / $0.15 output per 1K tokens)
  • meta/llama-3.1-405b-instruct - LLaMA 3.1 405B (128K context, $5.32 input / $16 output per 1K tokens)

Request Parameters

Field NameTypeExampleDescriptionRequired
typetextCHAT_WITH_IMAGEFeature identifier✔️
modeltextgpt-4oAI model to use✔️
conversationIdtextimage-analysis-123Unique conversation identifier✔️
promptObject.promptstringWhat do you see in this image?Your message or question✔️
promptObject.imageListarray["image1.jpg", "image2.png"]Array of uploaded image keys✔️
promptObject.isMixedbooleanfalseEnable mixed conversation mode✖️
promptObject.webSearchbooleanfalseEnable web search for enhanced responses✖️
promptObject.numOfSitenumber5Number of sites to search (if webSearch enabled)✖️
promptObject.maxWordnumber1000Max words per site (if webSearch enabled)✖️

Parameter Details

imageList: Array of image keys obtained from uploading images via the Asset API. Images should be in PNG, JPEG, or WebP format.

conversationId: Unique identifier for the conversation. Use the same ID to continue an existing conversation or create a new one for each new chat session.

isMixed: When enabled, allows mixing different models within the same conversation thread.

webSearch: Enables real-time web search to enhance responses with current information related to the image content.

Endpoint

Request Headers

FieldValue
API-KEY<api-key>
Content-Typeapplication/json

Code Example

curl --location 'https://api.1min.ai/api/features' \
--header 'API-KEY: <api-key>' \
--header 'Content-Type: application/json' \
--data '{
"type": "CHAT_WITH_IMAGE",
"model": "gpt-4o",
"conversationId": "image-analysis-session-1",
"promptObject": {
"prompt": "What do you see in this image? Please describe the scene in detail.",
"imageList": ["uploads/images/photo-123.jpg"],
"isMixed": false,
"webSearch": false,
"numOfSite": 3,
"maxWord": 500
}
}'

Interactive Playground

API Playground

https://api.1min.ai/api/features

Generated cURL Command:

curl -X POST "https://api.1min.ai/api/features" \
-H "API-KEY: <your-api-key>" \
-H "Content-Type: application/json" \
-d '{
"type": "CHAT_WITH_IMAGE",
"model": "gpt-4o",
"conversationId": "image-analysis-session-1",
"promptObject": {
"prompt": "What do you see in this image? Please describe the scene in detail.",
"imageList": [
"uploads/images/photo-123.jpg"
],
"isMixed": false,
"webSearch": false,
"numOfSite": 3,
"maxWord": 500
}
}'

Response Format

{
"status": "success",
"message": "Streaming response will be returned",
"data": {
"conversationId": "image-analysis-session-1",
"aiRecordId": "record-123",
"streamingUrl": "/api/features/stream/record-123"
}
}

Conversation Examples

Basic Image Description

{
"type": "CHAT_WITH_IMAGE",
"model": "gpt-4o",
"conversationId": "photo-analysis-1",
"promptObject": {
"prompt": "Describe this image in detail",
"imageList": ["landscape-photo.jpg"]
}
}

Multiple Images Comparison

{
"type": "CHAT_WITH_IMAGE",
"model": "claude-3-5-sonnet-20240620",
"conversationId": "comparison-session",
"promptObject": {
"prompt": "Compare these two images and tell me the differences",
"imageList": ["image1.jpg", "image2.jpg"]
}
}
{
"type": "CHAT_WITH_IMAGE",
"model": "gemini-1.5-pro",
"conversationId": "research-session",
"promptObject": {
"prompt": "What building is this and what's its historical significance?",
"imageList": ["building-photo.jpg"],
"webSearch": true,
"numOfSite": 5,
"maxWord": 1000
}
}

Best Practices

  1. Image Quality: Use clear, high-resolution images for better analysis results
  2. Specific Questions: Ask specific questions to get more detailed and relevant responses
  3. Context: Provide context about what you're looking for in the image
  4. Conversation Flow: Use the same conversationId to maintain context across multiple exchanges
  5. Model Selection: Choose models based on your needs:
    • GPT-4o: Best overall performance for detailed analysis
    • Gemini: Excellent for large context and multiple images
    • Claude: Great for creative and detailed descriptions
    • Mini models: Cost-effective for simple questions

Error Handling

Common error scenarios:

  • Invalid image format: Ensure images are in PNG, JPEG, or WebP format
  • Image too large: Resize images if they exceed size limits
  • Missing images: Verify that imageList contains valid uploaded image keys
  • Model limitations: Some models may have specific image size or format requirements

Authentication

This endpoint requires an API key to be provided in the API-KEY header.