Whisper-1 - Speech to Text

Convert speech to text using OpenAI's powerful Whisper-1 model, offering exceptional accuracy across multiple languages and audio qualities. Whisper is a state-of-the-art automatic speech recognition system trained on diverse multilingual data.

Note: Audio files must first be uploaded using the Asset API before transcription. The audioUrl parameter should contain the path returned from the Asset API upload.

Supported Models

whisper-1: OpenAI's Whisper model with multilingual support and robust performance across various audio conditions

Endpoint

POSThttps://api.1min.ai/api/features

Request Headers

Field	Value
API-KEY	`<api-key>`
Content-Type	`application/json`

Supported Audio Formats

MP3 - MPEG Audio Layer III
WAV - Waveform Audio File Format
M4A - MPEG-4 Audio
FLAC - Free Lossless Audio Codec
MP4 - MPEG-4 Part 14 (audio only)
WEBM - WebM Audio
OGG - Ogg Vorbis

Language Support

OpenAI's Whisper model supports 99+ languages. The model uses ISO 639-1 language codes. Some of the most commonly used languages include:

af - Afrikaans
am - Amharic
ar - Arabic
as - Assamese
az - Azerbaijani
ba - Bashkir
be - Belarusian
bg - Bulgarian
bn - Bengali
bo - Tibetan
br - Breton
bs - Bosnian
ca - Catalan
cs - Czech
cy - Welsh
da - Danish
de - German
el - Greek
en - English
es - Spanish
et - Estonian
eu - Basque
fa - Persian
fi - Finnish
fo - Faroese
fr - French
gl - Galician
gu - Gujarati
ha - Hausa
haw - Hawaiian
he - Hebrew
hi - Hindi
hr - Croatian
ht - Haitian Creole
hu - Hungarian
hy - Armenian
id - Indonesian
is - Icelandic
it - Italian
ja - Japanese
jw - Javanese
ka - Georgian
kk - Kazakh
km - Khmer
kn - Kannada
ko - Korean
la - Latin
lb - Luxembourgish
ln - Lingala
lo - Lao
lt - Lithuanian
lv - Latvian
mg - Malagasy
mi - Maori
mk - Macedonian
ml - Malayalam
mn - Mongolian
mr - Marathi
ms - Malay
mt - Maltese
my - Myanmar
ne - Nepali
nl - Dutch
nn - Norwegian Nynorsk
no - Norwegian
oc - Occitan
pa - Punjabi
pl - Polish
ps - Pashto
pt - Portuguese
ro - Romanian
ru - Russian
sa - Sanskrit
sd - Sindhi
si - Sinhala
sk - Slovak
sl - Slovenian
sn - Shona
so - Somali
sq - Albanian
sr - Serbian
su - Sundanese
sv - Swedish
sw - Swahili
ta - Tamil
te - Telugu
tg - Tajik
th - Thai
tk - Turkmen
tl - Tagalog
tr - Turkish
tt - Tatar
uk - Ukrainian
ur - Urdu
uz - Uzbek
vi - Vietnamese
yi - Yiddish
yo - Yoruba
zh - Chinese

Note: For the complete and most up-to-date list of all supported languages, please refer to the OpenAI Whisper Supported Languages documentation.

Response Formats

Whisper supports multiple response formats:

text - Plain text transcription (default)
json - JSON object with transcription text
srt - SubRip subtitle format
verbose_json - JSON with additional metadata including timestamps and confidence scores
vtt - WebVTT subtitle format

Parameters

Parameter	Type	Required	Description
`type`	string	Yes	Feature type, must be "SPEECH_TO_TEXT"
`model`	string	Yes	Model identifier, use "whisper-1"
`promptObject.audioUrl`	string	Yes	Path to audio file (uploaded via Asset API)
`promptObject.response_format`	string	No	Format of transcription response (default: "text")
`promptObject.language`	string	No	Language code for transcription (auto-detected if not specified)
`promptObject.prompt`	string	No	Optional text prompt to guide the transcription style
`promptObject.temperature`	number	No	Sampling temperature between 0 and 1 (default: 0)

Code Examples

cURL
JavaScript
Python

curl --location 'https://api.1min.ai/api/features' \
--header 'API-KEY: <api-key>' \
--header 'Content-Type: application/json' \
--data '{
"type": "SPEECH_TO_TEXT",
"model": "whisper-1",
"promptObject": {
  "audioUrl": "audios/2025_10_21_08_22_58_741_whisper_audio.mp3",
  "response_format": "text"
}
}'

fetch('https://api.1min.ai/api/features', {
method: 'POST',
headers: {
  'Content-Type': 'application/json',
  'API-KEY': 'YOUR_API_KEY'
},
body: JSON.stringify({
  type: 'SPEECH_TO_TEXT',
  model: 'whisper-1',
  promptObject: {
    audioUrl: 'audios/2025_10_21_08_22_58_741_whisper_audio.mp3',
    response_format: 'text',
    language: 'en'
  }
})
})

import requests

url = "https://api.1min.ai/api/features"
headers = {
"Content-Type": "application/json",
"API-KEY": "YOUR_API_KEY"
}

data = {
"type": "SPEECH_TO_TEXT",
"model": "whisper-1",
"promptObject": {
  "audioUrl": "audios/2025_10_21_08_22_58_741_whisper_audio.mp3",
  "response_format": "text",
  "language": "en"
}
}

response = requests.post(url, headers=headers, json=data)

Interactive Playground

Try the API directly in your browser:

API Playground

https://api.1min.ai/api/features

AI Model *

Audio URL *

Path to the audio file you want to transcribe (upload via Asset API first)

Response Format

Language (Optional)

Temperature (Optional)

Sampling temperature between 0 and 1. Higher values make output more random.

Generated cURL Command:

curl -X POST "https://api.1min.ai/api/features" \
  -H "API-KEY: <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
  "type": "SPEECH_TO_TEXT",
  "model": "whisper-1",
  "promptObject": {
    "audioUrl": "audios/2025_10_21_08_22_58_741_whisper_audio.mp3",
    "response_format": "text",
    "language": "en"
  }
}'

Use Cases

Podcast Transcription: Convert podcast episodes and audio content to searchable text
Meeting Documentation: Transcribe business meetings, conferences, and interviews
Educational Content: Create transcripts for lectures, training sessions, and educational videos
Content Creation: Generate text versions of audio content for blogs and articles
Accessibility: Create captions and transcripts for audio/video content
Multilingual Content: Transcribe content in 99+ supported languages
Voice Memos: Convert personal voice recordings to text notes
Customer Service: Transcribe customer calls and support interactions
Media Production: Generate subtitles and closed captions for video content
Research: Transcribe interviews, focus groups, and research recordings

Tips for Best Results

Upload First: Use the Asset API to upload your audio file before transcription
Audio Quality: High-quality audio with clear speech produces the best results
Language Specification: While optional, specifying the language can improve accuracy
Response Format: Choose the appropriate format based on your needs (text, JSON, SRT, etc.)
Temperature Setting: Use 0 for consistent results, higher values for more creative transcription
File Size: Whisper handles files up to 25MB effectively
Background Noise: Minimize background noise for optimal transcription quality
Multiple Speakers: Whisper can handle multiple speakers but works best with clear audio

Supported Models​

Endpoint​

Request Headers​

Supported Audio Formats​

Language Support​

Response Formats​

Parameters​

Code Examples​

Interactive Playground​