Skip to main content

Caption Generator

Generate accurate captions, subtitles, and transcripts for video content using advanced speech recognition and AI models through the 1min.AI API. This feature converts spoken audio in videos to text with high accuracy across multiple languages and formats.

Available Models

Loading available models...

Request Parameters

All models share the same request structure:

Field NameTypeSupported ValueDescriptionRequired
typetextCAPTIONS_GENERATORFeature name✔️
modeltextSee available modelsModel identifier✔️
conversationIdtextCAPTIONS_GENERATORConversation ID✔️
promptObject.videoUrlstringAsset keyAsset key✔️
promptObject.response_formatstringFormat typeResponse format
promptObject.timestamp_granularitiesarrayGranularity levelsTimestamp precision
promptObject.languagestringLanguage codeLanguage for transcription

Endpoint

Request Headers

FieldValue
API-KEY<api-key>
Content-Typeapplication/json

Code Example

curl --location 'https://api.1min.ai/api/features' \
--header 'API-KEY: <api-key>' \
--header 'Content-Type: application/json' \
--data '{
"type": "CAPTIONS_GENERATOR",
"model": "whisper-1",
"conversationId": "CAPTIONS_GENERATOR",
"promptObject": {
"videoUrl": "your-video-asset-key",
"language": "en",
"response_format": "verbose_json",
"timestamp_granularities": ["word", "segment"]
}
}'

Interactive Playground

Try the API directly in your browser:

Loading interactive playground...

Response Format

{}

Output Formats

SRT Format

1
00:00:00,000 --> 00:00:03,240
Welcome to this video tutorial on AI technology.

2
00:00:03,240 --> 00:00:07,080
Today we'll explore the latest developments in machine learning.

WebVTT Format

WEBVTT

00:00:00.000 --> 00:00:03.240
Welcome to this video tutorial on AI technology.

00:00:03.240 --> 00:00:07.080
Today we'll explore the latest developments in machine learning.

Verbose JSON Format (Default)

{
"task": "transcribe",
"language": "english",
"duration": 7.08,
"text": "Welcome to this video tutorial on AI technology. Today we'll explore the latest developments in machine learning.",
"segments": [
{
"id": 0,
"start": 0.0,
"end": 3.24,
"text": "Welcome to this video tutorial on AI technology.",
"words": [
{
"word": "Welcome",
"start": 0.0,
"end": 0.4
}
]
}
]
}

Supported Languages

The Caption Generator supports 99+ languages including:

Major Languages:

  • English, Spanish, French, German, Italian, Portuguese
  • Russian, Chinese (Simplified/Traditional), Japanese, Korean
  • Arabic, Hindi, Dutch, Polish, Turkish, Swedish
  • Norwegian, Danish, Finnish, Hebrew, Thai, Vietnamese

Regional Variants:

  • Portuguese (Brazil), Spanish (Latin America)
  • Chinese (Simplified/Traditional)
  • English (US/UK/AU variants)

Use Cases

  • Content Creation: Add captions to YouTube videos, social media content
  • Accessibility: Make video content accessible to deaf and hard-of-hearing audiences
  • Language Learning: Generate transcripts for educational content
  • Documentation: Convert meeting recordings to text transcripts
  • SEO Optimization: Create searchable text content from video
  • Compliance: Meet accessibility requirements for corporate content

Best Practices

  1. Audio Quality: Ensure clear audio with minimal background noise
  2. Language Selection: Specify the language for better accuracy when known
  3. File Formats: Use common video formats (MP4, MOV, AVI) for best results
  4. File Size: Keep video files under 25MB for optimal processing speed
  5. Speaker Clarity: For multi-speaker detection, ensure speakers are distinct

Technical Requirements

  • Supported Video Formats: MP4, MOV, AVI, MKV, WebM
  • Supported Audio Formats: MP3, WAV, M4A, FLAC
  • Maximum File Size: 25MB per file
  • Maximum Duration: 30 minutes per video
  • Audio Quality: Minimum 16kHz sample rate recommended