Whisper-1 - Speech to Text
Convert speech to text using OpenAI's powerful Whisper-1 model, offering exceptional accuracy across multiple languages and audio qualities. Whisper is a state-of-the-art automatic speech recognition system trained on diverse multilingual data.
Note: Audio files must first be uploaded using the Asset API before transcription. The audioUrl parameter should contain the path returned from the Asset API upload.
Supported Models
whisper-1: OpenAI's Whisper model with multilingual support and robust performance across various audio conditions
Endpoint
Request Headers
| Field | Value |
|---|---|
| API-KEY | <api-key> |
| Content-Type | application/json |
Supported Audio Formats
- MP3 - MPEG Audio Layer III
- WAV - Waveform Audio File Format
- M4A - MPEG-4 Audio
- FLAC - Free Lossless Audio Codec
- MP4 - MPEG-4 Part 14 (audio only)
- WEBM - WebM Audio
- OGG - Ogg Vorbis
Language Support
OpenAI's Whisper model supports 99+ languages. The model uses ISO 639-1 language codes. Some of the most commonly used languages include:
af- Afrikaansam- Amharicar- Arabicas- Assameseaz- Azerbaijaniba- Bashkirbe- Belarusianbg- Bulgarianbn- Bengalibo- Tibetanbr- Bretonbs- Bosnianca- Catalancs- Czechcy- Welshda- Danishde- Germanel- Greeken- Englishes- Spanishet- Estonianeu- Basquefa- Persianfi- Finnishfo- Faroesefr- Frenchgl- Galiciangu- Gujaratiha- Hausahaw- Hawaiianhe- Hebrewhi- Hindihr- Croatianht- Haitian Creolehu- Hungarianhy- Armenianid- Indonesianis- Icelandicit- Italianja- Japanesejw- Javaneseka- Georgiankk- Kazakhkm- Khmerkn- Kannadako- Koreanla- Latinlb- Luxembourgishln- Lingalalo- Laolt- Lithuanianlv- Latvianmg- Malagasymi- Maorimk- Macedonianml- Malayalammn- Mongolianmr- Marathims- Malaymt- Maltesemy- Myanmarne- Nepalinl- Dutchnn- Norwegian Nynorskno- Norwegianoc- Occitanpa- Punjabipl- Polishps- Pashtopt- Portuguesero- Romanianru- Russiansa- Sanskritsd- Sindhisi- Sinhalask- Slovaksl- Sloveniansn- Shonaso- Somalisq- Albaniansr- Serbiansu- Sundanesesv- Swedishsw- Swahilita- Tamilte- Telugutg- Tajikth- Thaitk- Turkmentl- Tagalogtr- Turkishtt- Tataruk- Ukrainianur- Urduuz- Uzbekvi- Vietnameseyi- Yiddishyo- Yorubazh- Chinese
Note: For the complete and most up-to-date list of all supported languages, please refer to the OpenAI Whisper Supported Languages documentation.
Response Formats
Whisper supports multiple response formats:
text- Plain text transcription (default)json- JSON object with transcription textsrt- SubRip subtitle formatverbose_json- JSON with additional metadata including timestamps and confidence scoresvtt- WebVTT subtitle format
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
type | string | Yes | Feature type, must be "SPEECH_TO_TEXT" |
model | string | Yes | Model identifier, use "whisper-1" |
promptObject.audioUrl | string | Yes | Path to audio file (uploaded via Asset API) |
promptObject.response_format | string | No | Format of transcription response (default: "text") |
promptObject.language | string | No | Language code for transcription (auto-detected if not specified) |
promptObject.prompt | string | No | Optional text prompt to guide the transcription style |
promptObject.temperature | number | No | Sampling temperature between 0 and 1 (default: 0) |
Code Examples
- cURL
- JavaScript
- Python
curl --location 'https://api.1min.ai/api/features' \
--header 'API-KEY: <api-key>' \
--header 'Content-Type: application/json' \
--data '{
"type": "SPEECH_TO_TEXT",
"model": "whisper-1",
"promptObject": {
"audioUrl": "audios/2025_10_21_08_22_58_741_whisper_audio.mp3",
"response_format": "text"
}
}'
fetch('https://api.1min.ai/api/features', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'API-KEY': 'YOUR_API_KEY'
},
body: JSON.stringify({
type: 'SPEECH_TO_TEXT',
model: 'whisper-1',
promptObject: {
audioUrl: 'audios/2025_10_21_08_22_58_741_whisper_audio.mp3',
response_format: 'text',
language: 'en'
}
})
})
import requests
url = "https://api.1min.ai/api/features"
headers = {
"Content-Type": "application/json",
"API-KEY": "YOUR_API_KEY"
}
data = {
"type": "SPEECH_TO_TEXT",
"model": "whisper-1",
"promptObject": {
"audioUrl": "audios/2025_10_21_08_22_58_741_whisper_audio.mp3",
"response_format": "text",
"language": "en"
}
}
response = requests.post(url, headers=headers, json=data)
Interactive Playground
Try the API directly in your browser:
API Playground
https://api.1min.ai/api/featuresGenerated cURL Command:
curl -X POST "https://api.1min.ai/api/features" \
-H "API-KEY: <your-api-key>" \
-H "Content-Type: application/json" \
-d '{
"type": "SPEECH_TO_TEXT",
"model": "whisper-1",
"promptObject": {
"audioUrl": "audios/2025_10_21_08_22_58_741_whisper_audio.mp3",
"response_format": "text",
"language": "en"
}
}'
Use Cases
- Podcast Transcription: Convert podcast episodes and audio content to searchable text
- Meeting Documentation: Transcribe business meetings, conferences, and interviews
- Educational Content: Create transcripts for lectures, training sessions, and educational videos
- Content Creation: Generate text versions of audio content for blogs and articles
- Accessibility: Create captions and transcripts for audio/video content
- Multilingual Content: Transcribe content in 99+ supported languages
- Voice Memos: Convert personal voice recordings to text notes
- Customer Service: Transcribe customer calls and support interactions
- Media Production: Generate subtitles and closed captions for video content
- Research: Transcribe interviews, focus groups, and research recordings
Tips for Best Results
- Upload First: Use the Asset API to upload your audio file before transcription
- Audio Quality: High-quality audio with clear speech produces the best results
- Language Specification: While optional, specifying the language can improve accuracy
- Response Format: Choose the appropriate format based on your needs (text, JSON, SRT, etc.)
- Temperature Setting: Use 0 for consistent results, higher values for more creative transcription
- File Size: Whisper handles files up to 25MB effectively
- Background Noise: Minimize background noise for optimal transcription quality
- Multiple Speakers: Whisper can handle multiple speakers but works best with clear audio