Skip to main content

Whisper-1 - Speech to Text

Convert speech to text using OpenAI's powerful Whisper-1 model, offering exceptional accuracy across multiple languages and audio qualities. Whisper is a state-of-the-art automatic speech recognition system trained on diverse multilingual data.

Note: Audio files must first be uploaded using the Asset API before transcription. The audioUrl parameter should contain the path returned from the Asset API upload.

Supported Models

  • whisper-1: OpenAI's Whisper model with multilingual support and robust performance across various audio conditions

Endpoint

Request Headers

FieldValue
API-KEY<api-key>
Content-Typeapplication/json

Supported Audio Formats

  • MP3 - MPEG Audio Layer III
  • WAV - Waveform Audio File Format
  • M4A - MPEG-4 Audio
  • FLAC - Free Lossless Audio Codec
  • MP4 - MPEG-4 Part 14 (audio only)
  • WEBM - WebM Audio
  • OGG - Ogg Vorbis

Language Support

OpenAI's Whisper model supports 99+ languages. The model uses ISO 639-1 language codes. Some of the most commonly used languages include:

  • af - Afrikaans
  • am - Amharic
  • ar - Arabic
  • as - Assamese
  • az - Azerbaijani
  • ba - Bashkir
  • be - Belarusian
  • bg - Bulgarian
  • bn - Bengali
  • bo - Tibetan
  • br - Breton
  • bs - Bosnian
  • ca - Catalan
  • cs - Czech
  • cy - Welsh
  • da - Danish
  • de - German
  • el - Greek
  • en - English
  • es - Spanish
  • et - Estonian
  • eu - Basque
  • fa - Persian
  • fi - Finnish
  • fo - Faroese
  • fr - French
  • gl - Galician
  • gu - Gujarati
  • ha - Hausa
  • haw - Hawaiian
  • he - Hebrew
  • hi - Hindi
  • hr - Croatian
  • ht - Haitian Creole
  • hu - Hungarian
  • hy - Armenian
  • id - Indonesian
  • is - Icelandic
  • it - Italian
  • ja - Japanese
  • jw - Javanese
  • ka - Georgian
  • kk - Kazakh
  • km - Khmer
  • kn - Kannada
  • ko - Korean
  • la - Latin
  • lb - Luxembourgish
  • ln - Lingala
  • lo - Lao
  • lt - Lithuanian
  • lv - Latvian
  • mg - Malagasy
  • mi - Maori
  • mk - Macedonian
  • ml - Malayalam
  • mn - Mongolian
  • mr - Marathi
  • ms - Malay
  • mt - Maltese
  • my - Myanmar
  • ne - Nepali
  • nl - Dutch
  • nn - Norwegian Nynorsk
  • no - Norwegian
  • oc - Occitan
  • pa - Punjabi
  • pl - Polish
  • ps - Pashto
  • pt - Portuguese
  • ro - Romanian
  • ru - Russian
  • sa - Sanskrit
  • sd - Sindhi
  • si - Sinhala
  • sk - Slovak
  • sl - Slovenian
  • sn - Shona
  • so - Somali
  • sq - Albanian
  • sr - Serbian
  • su - Sundanese
  • sv - Swedish
  • sw - Swahili
  • ta - Tamil
  • te - Telugu
  • tg - Tajik
  • th - Thai
  • tk - Turkmen
  • tl - Tagalog
  • tr - Turkish
  • tt - Tatar
  • uk - Ukrainian
  • ur - Urdu
  • uz - Uzbek
  • vi - Vietnamese
  • yi - Yiddish
  • yo - Yoruba
  • zh - Chinese

Note: For the complete and most up-to-date list of all supported languages, please refer to the OpenAI Whisper Supported Languages documentation.

Response Formats

Whisper supports multiple response formats:

  • text - Plain text transcription (default)
  • json - JSON object with transcription text
  • srt - SubRip subtitle format
  • verbose_json - JSON with additional metadata including timestamps and confidence scores
  • vtt - WebVTT subtitle format

Parameters

ParameterTypeRequiredDescription
typestringYesFeature type, must be "SPEECH_TO_TEXT"
modelstringYesModel identifier, use "whisper-1"
promptObject.audioUrlstringYesPath to audio file (uploaded via Asset API)
promptObject.response_formatstringNoFormat of transcription response (default: "text")
promptObject.languagestringNoLanguage code for transcription (auto-detected if not specified)
promptObject.promptstringNoOptional text prompt to guide the transcription style
promptObject.temperaturenumberNoSampling temperature between 0 and 1 (default: 0)

Code Examples

curl --location 'https://api.1min.ai/api/features' \
--header 'API-KEY: <api-key>' \
--header 'Content-Type: application/json' \
--data '{
"type": "SPEECH_TO_TEXT",
"model": "whisper-1",
"promptObject": {
"audioUrl": "audios/2025_10_21_08_22_58_741_whisper_audio.mp3",
"response_format": "text"
}
}'

Interactive Playground

Try the API directly in your browser:

API Playground

https://api.1min.ai/api/features
Path to the audio file you want to transcribe (upload via Asset API first)
Sampling temperature between 0 and 1. Higher values make output more random.

Generated cURL Command:

curl -X POST "https://api.1min.ai/api/features" \
-H "API-KEY: <your-api-key>" \
-H "Content-Type: application/json" \
-d '{
"type": "SPEECH_TO_TEXT",
"model": "whisper-1",
"promptObject": {
"audioUrl": "audios/2025_10_21_08_22_58_741_whisper_audio.mp3",
"response_format": "text",
"language": "en"
}
}'

Use Cases

  • Podcast Transcription: Convert podcast episodes and audio content to searchable text
  • Meeting Documentation: Transcribe business meetings, conferences, and interviews
  • Educational Content: Create transcripts for lectures, training sessions, and educational videos
  • Content Creation: Generate text versions of audio content for blogs and articles
  • Accessibility: Create captions and transcripts for audio/video content
  • Multilingual Content: Transcribe content in 99+ supported languages
  • Voice Memos: Convert personal voice recordings to text notes
  • Customer Service: Transcribe customer calls and support interactions
  • Media Production: Generate subtitles and closed captions for video content
  • Research: Transcribe interviews, focus groups, and research recordings

Tips for Best Results

  1. Upload First: Use the Asset API to upload your audio file before transcription
  2. Audio Quality: High-quality audio with clear speech produces the best results
  3. Language Specification: While optional, specifying the language can improve accuracy
  4. Response Format: Choose the appropriate format based on your needs (text, JSON, SRT, etc.)
  5. Temperature Setting: Use 0 for consistent results, higher values for more creative transcription
  6. File Size: Whisper handles files up to 25MB effectively
  7. Background Noise: Minimize background noise for optimal transcription quality
  8. Multiple Speakers: Whisper can handle multiple speakers but works best with clear audio