Caption Generator
Generate accurate captions, subtitles, and transcripts for video content using advanced speech recognition and AI models through the 1min.AI API. This feature converts spoken audio in videos to text with high accuracy across multiple languages and formats.
Available Models
Loading available models...
Request Parameters
All models share the same request structure:
| Field Name | Type | Supported Value | Description | Required |
|---|---|---|---|---|
| type | text | CAPTIONS_GENERATOR | Feature name | ✔️ |
| model | text | See available models | Model identifier | ✔️ |
| conversationId | text | CAPTIONS_GENERATOR | Conversation ID | ✔️ |
| promptObject.videoUrl | string | Asset key | Asset key | ✔️ |
| promptObject.response_format | string | Format type | Response format | ❌ |
| promptObject.timestamp_granularities | array | Granularity levels | Timestamp precision | ❌ |
| promptObject.language | string | Language code | Language for transcription | ❌ |
Endpoint
Request Headers
| Field | Value |
|---|---|
| API-KEY | <api-key> |
| Content-Type | application/json |
Code Example
- cURL
- JavaScript
- Python
curl --location 'https://api.1min.ai/api/features' \
--header 'API-KEY: <api-key>' \
--header 'Content-Type: application/json' \
--data '{
"type": "CAPTIONS_GENERATOR",
"model": "whisper-1",
"conversationId": "CAPTIONS_GENERATOR",
"promptObject": {
"videoUrl": "your-video-asset-key",
"language": "en",
"response_format": "verbose_json",
"timestamp_granularities": ["word", "segment"]
}
}'
fetch('https://api.1min.ai/api/features', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'API-KEY': 'YOUR_API_KEY'
},
body: JSON.stringify({
type: 'CAPTIONS_GENERATOR',
model: 'whisper-1',
conversationId: 'CAPTIONS_GENERATOR',
promptObject: {
videoUrl: 'your-video-asset-key',
language: 'en',
response_format: 'verbose_json',
timestamp_granularities: ['word', 'segment']
}
})
})
import requests
url = "https://api.1min.ai/api/features"
headers = {
"Content-Type": "application/json",
"API-KEY": "YOUR_API_KEY"
}
data = {
"type": "CAPTIONS_GENERATOR",
"model": "whisper-1",
"conversationId": "CAPTIONS_GENERATOR",
"promptObject": {
"videoUrl": "your-video-asset-key",
"language": "en",
"response_format": "verbose_json",
"timestamp_granularities": ["word", "segment"]
}
}
response = requests.post(url, headers=headers, json=data)
Interactive Playground
Try the API directly in your browser:
Loading interactive playground...
Response Format
{}
Output Formats
SRT Format
1
00:00:00,000 --> 00:00:03,240
Welcome to this video tutorial on AI technology.
2
00:00:03,240 --> 00:00:07,080
Today we'll explore the latest developments in machine learning.
WebVTT Format
WEBVTT
00:00:00.000 --> 00:00:03.240
Welcome to this video tutorial on AI technology.
00:00:03.240 --> 00:00:07.080
Today we'll explore the latest developments in machine learning.
Verbose JSON Format (Default)
{
"task": "transcribe",
"language": "english",
"duration": 7.08,
"text": "Welcome to this video tutorial on AI technology. Today we'll explore the latest developments in machine learning.",
"segments": [
{
"id": 0,
"start": 0.0,
"end": 3.24,
"text": "Welcome to this video tutorial on AI technology.",
"words": [
{
"word": "Welcome",
"start": 0.0,
"end": 0.4
}
]
}
]
}
Supported Languages
The Caption Generator supports 99+ languages including:
Major Languages:
- English, Spanish, French, German, Italian, Portuguese
- Russian, Chinese (Simplified/Traditional), Japanese, Korean
- Arabic, Hindi, Dutch, Polish, Turkish, Swedish
- Norwegian, Danish, Finnish, Hebrew, Thai, Vietnamese
Regional Variants:
- Portuguese (Brazil), Spanish (Latin America)
- Chinese (Simplified/Traditional)
- English (US/UK/AU variants)
Use Cases
- Content Creation: Add captions to YouTube videos, social media content
- Accessibility: Make video content accessible to deaf and hard-of-hearing audiences
- Language Learning: Generate transcripts for educational content
- Documentation: Convert meeting recordings to text transcripts
- SEO Optimization: Create searchable text content from video
- Compliance: Meet accessibility requirements for corporate content
Best Practices
- Audio Quality: Ensure clear audio with minimal background noise
- Language Selection: Specify the language for better accuracy when known
- File Formats: Use common video formats (MP4, MOV, AVI) for best results
- File Size: Keep video files under 25MB for optimal processing speed
- Speaker Clarity: For multi-speaker detection, ensure speakers are distinct
Technical Requirements
- Supported Video Formats: MP4, MOV, AVI, MKV, WebM
- Supported Audio Formats: MP3, WAV, M4A, FLAC
- Maximum File Size: 25MB per file
- Maximum Duration: 30 minutes per video
- Audio Quality: Minimum 16kHz sample rate recommended