Caption Generator

Generate accurate captions, subtitles, and transcripts for video content using advanced speech recognition and AI models through the 1min.AI API. This feature converts spoken audio in videos to text with high accuracy across multiple languages and formats.

Available Models

Loading available models...

Request Parameters

All models share the same request structure:

Field Name	Type	Supported Value	Description	Required
type	text	CAPTIONS_GENERATOR	Feature name	✔️
model	text	See available models	Model identifier	✔️
conversationId	text	CAPTIONS_GENERATOR	Conversation ID	✔️
promptObject.videoUrl	string	Asset key	Asset key	✔️
promptObject.response_format	string	Format type	Response format	❌
promptObject.timestamp_granularities	array	Granularity levels	Timestamp precision	❌
promptObject.language	string	Language code	Language for transcription	❌

Endpoint

POSThttps://api.1min.ai/api/features

Request Headers

Field	Value
API-KEY	`<api-key>`
Content-Type	`application/json`

Code Example

cURL
JavaScript
Python

curl --location 'https://api.1min.ai/api/features' \
--header 'API-KEY: <api-key>' \
--header 'Content-Type: application/json' \
--data '{
"type": "CAPTIONS_GENERATOR",
"model": "whisper-1",
"conversationId": "CAPTIONS_GENERATOR",
"promptObject": {
  "videoUrl": "your-video-asset-key",
  "language": "en",
  "response_format": "verbose_json",
  "timestamp_granularities": ["word", "segment"]
}
}'

fetch('https://api.1min.ai/api/features', {
method: 'POST',
headers: {
  'Content-Type': 'application/json',
  'API-KEY': 'YOUR_API_KEY'
},
body: JSON.stringify({
  type: 'CAPTIONS_GENERATOR',
  model: 'whisper-1',
  conversationId: 'CAPTIONS_GENERATOR',
  promptObject: {
    videoUrl: 'your-video-asset-key',
    language: 'en',
    response_format: 'verbose_json',
    timestamp_granularities: ['word', 'segment']
  }
})
})

import requests

url = "https://api.1min.ai/api/features"
headers = {
"Content-Type": "application/json",
"API-KEY": "YOUR_API_KEY"
}

data = {
"type": "CAPTIONS_GENERATOR",
"model": "whisper-1",
"conversationId": "CAPTIONS_GENERATOR",
"promptObject": {
  "videoUrl": "your-video-asset-key",
  "language": "en",
  "response_format": "verbose_json",
  "timestamp_granularities": ["word", "segment"]
}
}

response = requests.post(url, headers=headers, json=data)

Interactive Playground

Try the API directly in your browser:

Loading interactive playground...

Response Format

{}

Key Features

High Accuracy: Advanced speech-to-text models for precise transcription
Multi-language Support: Supports 99+ languages for global content
Format Flexibility: Generate captions in various subtitle formats
Timestamp Precision: Accurate timing information for perfect synchronization
Video Processing: Direct video upload and processing capabilities
Speaker Detection: Identify different speakers in multi-person content

Output Formats

SRT Format

1
00:00:00,000 --> 00:00:03,240
Welcome to this video tutorial on AI technology.

2
00:00:03,240 --> 00:00:07,080
Today we'll explore the latest developments in machine learning.

WebVTT Format

WEBVTT

00:00:00.000 --> 00:00:03.240
Welcome to this video tutorial on AI technology.

00:00:03.240 --> 00:00:07.080
Today we'll explore the latest developments in machine learning.

Verbose JSON Format (Default)

{
  "task": "transcribe",
  "language": "english",
  "duration": 7.08,
  "text": "Welcome to this video tutorial on AI technology. Today we'll explore the latest developments in machine learning.",
  "segments": [
    {
      "id": 0,
      "start": 0.0,
      "end": 3.24,
      "text": "Welcome to this video tutorial on AI technology.",
      "words": [
        {
          "word": "Welcome",
          "start": 0.0,
          "end": 0.4
        }
      ]
    }
  ]
}

Supported Languages

The Caption Generator supports 99+ languages including:

Major Languages:

English, Spanish, French, German, Italian, Portuguese
Russian, Chinese (Simplified/Traditional), Japanese, Korean
Arabic, Hindi, Dutch, Polish, Turkish, Swedish
Norwegian, Danish, Finnish, Hebrew, Thai, Vietnamese

Regional Variants:

Portuguese (Brazil), Spanish (Latin America)
Chinese (Simplified/Traditional)
English (US/UK/AU variants)

Use Cases

Content Creation: Add captions to YouTube videos, social media content
Accessibility: Make video content accessible to deaf and hard-of-hearing audiences
Language Learning: Generate transcripts for educational content
Documentation: Convert meeting recordings to text transcripts
SEO Optimization: Create searchable text content from video
Compliance: Meet accessibility requirements for corporate content

Best Practices

Audio Quality: Ensure clear audio with minimal background noise
Language Selection: Specify the language for better accuracy when known
File Formats: Use common video formats (MP4, MOV, AVI) for best results
File Size: Keep video files under 25MB for optimal processing speed
Speaker Clarity: For multi-speaker detection, ensure speakers are distinct

Technical Requirements

Supported Video Formats: MP4, MOV, AVI, MKV, WebM
Supported Audio Formats: MP3, WAV, M4A, FLAC
Maximum File Size: 25MB per file
Maximum Duration: 30 minutes per video
Audio Quality: Minimum 16kHz sample rate recommended

Error Handling

Common error scenarios:

Unsupported file formats or corrupted files
Files exceeding size or duration limits
Poor audio quality resulting in low confidence scores
Network timeouts for large file uploads
Language detection failures for unclear speech

For detailed error codes and troubleshooting, refer to the main API documentation.

Available Models​

Request Parameters​

Endpoint​

Request Headers​

Code Example​

Interactive Playground​

Response Format​

Key Features​

Output Formats​

SRT Format​

WebVTT Format​

Verbose JSON Format (Default)​

Supported Languages​

Use Cases​

Best Practices​

Technical Requirements​

Error Handling​