ElevenLabs Voice Cloning
Clone any voice from an audio sample and generate speech with that cloned voice using ElevenLabs' advanced AI voice cloning technology. Upload an audio file containing the voice you want to clone, provide text to convert to speech, and receive high-quality audio output in the cloned voice.
Note: Audio files must first be uploaded using the Asset API before voice cloning. The audioUrl parameter should contain the path returned from the Asset API upload.
Supported Models
elevenlabs-voice-cloning: ElevenLabs Voice Cloning service with Eleven Flash v2.5
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
type | string | Yes | Feature type, must be "VOICE_CLONING" |
model | string | Yes | Model identifier, use "elevenlabs-voice-cloning" |
promptObject.audioUrl | string | Yes | Path to audio file containing voice to clone (uploaded via Asset API) |
promptObject.text | string | Yes | Text to convert to speech using the cloned voice |
promptObject.output_format | string | No | Output audio format (default: "mp3_44100_128") |
promptObject.model_id | string | No | ElevenLabs model ID (default: "eleven_flash_v2_5") |
promptObject.language_code | string | No | Language code for the voice (default: "en") |
promptObject.remove_background_noise | boolean | No | Remove background noise from source audio (default: false) |
promptObject.voice_settings.stability | number | No | Voice stability (0.0-1.0, default: 0.5) |
promptObject.voice_settings.similarity_boost | number | No | Voice similarity boost (0.0-1.0, default: 0.75) |
promptObject.voice_settings.style | number | No | Voice style exaggeration (0.0-1.0, default: 0.0) |
promptObject.voice_settings.use_speaker_boost | boolean | No | Use speaker boost for better clarity (default: true) |
Endpoint
Request Headers
| Field | Value |
|---|---|
| API-KEY | <api-key> |
| Content-Type | application/json |
Supported Audio Formats
Input Formats (Voice Source)
- MP3 - MPEG Audio Layer III
- WAV - Waveform Audio File Format
- M4A - MPEG-4 Audio
- FLAC - Free Lossless Audio Codec
- MP4 - MPEG-4 Part 14 (audio only)
- WEBM - WebM Audio
- OGG - Ogg Vorbis
Output Formats
mp3_44100_128- MP3, 44.1kHz, 128kbps (default)mp3_44100_64- MP3, 44.1kHz, 64kbpsmp3_44100_96- MP3, 44.1kHz, 96kbpsmp3_44100_192- MP3, 44.1kHz, 192kbpsmp3_22050_32- MP3, 22.05kHz, 32kbpspcm_16000- PCM, 16kHzpcm_22050- PCM, 22.05kHzpcm_24000- PCM, 24kHzpcm_44100- PCM, 44.1kHz
Supported Models
ElevenLabs Voice Models
eleven_flash_v2_5- Eleven Flash v2.5 (default, fastest)eleven_turbo_v2_5- Eleven Turbo v2.5 (balanced speed and quality)eleven_multilingual_v2- Eleven Multilingual v2 (supports multiple languages)
Language Support
The API supports automatic language detection for voice cloning and text-to-speech conversion in multiple languages:
en- Englishes- Spanishfr- Frenchde- Germanit- Italianpt- Portugueseru- Russianja- Japaneseko- Koreanzh- Chinesear- Arabichi- Hindi
And many more languages supported by ElevenLabs.
Voice Settings Explained
Stability (0.0 - 1.0)
- Low (0.0-0.3): More variable and expressive, but may be inconsistent
- Medium (0.4-0.7): Balanced stability and expressiveness (recommended)
- High (0.8-1.0): Very stable but may sound monotone
Similarity Boost (0.0 - 1.0)
- Low (0.0-0.3): More creative interpretation of the voice
- Medium (0.4-0.7): Balanced similarity to original voice
- High (0.8-1.0): Maximum similarity to the source voice (recommended)
Style (0.0 - 1.0)
- Low (0.0): Natural speech patterns (recommended for most use cases)
- High (1.0): Exaggerated style and emotion
Speaker Boost
- Enabled: Enhances speaker similarity and audio clarity (recommended)
- Disabled: Standard processing without additional enhancement
Code Example
- cURL
- JavaScript
- Python
curl --location 'https://api.1min.ai/api/features' \
--header 'API-KEY: <api-key>' \
--header 'Content-Type: application/json' \
--data '{
"type": "VOICE_CLONING",
"model": "elevenlabs-voice-cloning",
"promptObject": {
"audioUrl": "audios/2025_10_21_10_25_35_749_short.mp3",
"text": "Hello, this is a test of voice cloning technology. The AI has learned to speak in my voice.",
"output_format": "mp3_22050_32",
"model_id": "eleven_flash_v2_5",
"language_code": "en",
"remove_background_noise": true,
"voice_settings": {
"stability": 0.5,
"similarity_boost": 0.75,
"style": 0.0,
"use_speaker_boost": true
}
}
}'
fetch('https://api.1min.ai/api/features', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'API-KEY': 'YOUR_API_KEY'
},
body: JSON.stringify({
type: 'VOICE_CLONING',
model: 'elevenlabs-voice-cloning',
promptObject: {
audioUrl: 'audios/2025_10_21_10_25_35_749_short.mp3',
text: 'Hello, this is a test of voice cloning technology. The AI has learned to speak in my voice.',
output_format: 'mp3_22050_32',
model_id: 'eleven_flash_v2_5',
language_code: 'en',
remove_background_noise: true,
voice_settings: {
stability: 0.5,
similarity_boost: 0.75,
style: 0.0,
use_speaker_boost: true
}
}
})
});
import requests
url = "https://api.1min.ai/api/features"
headers = {
"Content-Type": "application/json",
"API-KEY": "YOUR_API_KEY"
}
data = {
"type": "VOICE_CLONING",
"model": "elevenlabs-voice-cloning",
"promptObject": {
"audioUrl": "audios/2025_10_21_10_25_35_749_short.mp3",
"text": "Hello, this is a test of voice cloning technology. The AI has learned to speak in my voice.",
"output_format": "mp3_22050_32",
"model_id": "eleven_flash_v2_5",
"language_code": "en",
"remove_background_noise": True,
"voice_settings": {
"stability": 0.5,
"similarity_boost": 0.75,
"style": 0.0,
"use_speaker_boost": True
}
}
}
response = requests.post(url, headers=headers, json=data)
Interactive Playground
Try the API directly in your browser:
API Playground
https://api.1min.ai/api/featuresPath to the audio file containing the voice to clone (upload via Asset API first)
The text that will be spoken in the cloned voice
Clean up background noise from the source audio
Voice stability - higher values are more stable but less expressive
How closely to match the original voice - higher values for better similarity
Style exaggeration - 0.0 for natural speech, higher for more dramatic delivery
Enhance speaker similarity and audio clarity
Generated cURL Command:
curl -X POST "https://api.1min.ai/api/features" \
-H "API-KEY: <your-api-key>" \
-H "Content-Type: application/json" \
-d '{
"type": "VOICE_CLONING",
"model": "elevenlabs-voice-cloning",
"promptObject": {
"audioUrl": "audios/2025_10_21_10_25_35_749_short.mp3",
"model_id": "eleven_flash_v2_5",
"text": "Hello, this is a test of voice cloning technology.",
"outputFormat": "mp3_44100_128",
"remove_background_noise": true,
"stability": 0.5,
"similarity_boost": 0.75,
"style": 0,
"use_speaker_boost": true
}
}'
Response Format
The API returns an audio file path that can be accessed via the Asset API:
{
"aiRecord": {
"uuid": "ab6fc10b-53f6-46d5-9c43-119723922138",
"userId": "c937fbcc-fa8f-4565-a440-c4d87f56fcb2",
"teamId": "a4e176b2-dabb-451e-9c58-62b451fa9630",
"teamUser": {
"teamId": "a4e176b2-dabb-451e-9c58-62b451fa9630",
"userId": "c937fbcc-fa8f-4565-a440-c4d87f56fcb2",
"userName": "John Doe",
"userAvatar": "https://lh3.googleusercontent.com/a/ACg8ocLqgsNsHRfmWF9d-E1RvJetVsEzxNOsOg-NXWNTpMxLDPJbwELI=s96-c",
"status": "ACTIVE",
"role": "ADMIN",
"creditLimit": 100000000,
"usedCredit": 324208,
"createdAt": "2025-10-20T04:13:40.847Z",
"createdBy": "SYSTEM",
"updatedAt": "2025-10-21T10:36:11.166Z",
"updatedBy": "SYSTEM"
},
"model": "elevenlabs-voice-cloning",
"type": "VOICE_CLONING",
"metadata": null,
"rating": null,
"feedback": null,
"conversationId": null,
"status": "SUCCESS",
"createdAt": "2025-10-21T10:39:31.287Z",
"aiRecordDetail": {
"promptObject": {
"text": "Hello, this is a test of voice cloning technology.",
"style": 0,
"audioUrl": "audios/2025_10_21_10_25_35_749_short.mp3",
"model_id": "eleven_flash_v2_5",
"stability": 0.5,
"outputFormat": "mp3_44100_128",
"similarity_boost": 0.75,
"use_speaker_boost": true,
"remove_background_noise": true
},
"resultObject": [
"development/audios/2025_10_21_17_39_38_149_155254.mp3"
],
"responseObject": {}
},
"additionalData": null,
"temporaryUrl": "https://s3.us-east-1.amazonaws.com/asset.1min.ai/development/audios/2025_10_21_17_39_38_149_155254.mp3?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=AKIAVRUVQEFIHSKAXGE7%2F20251021%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20251021T103939Z&X-Amz-Expires=604800&X-Amz-Signature=e89abbc8095df3d3a857cd7840ba96f0ef0559dcadf7fc11be5393dc5e8fc308&X-Amz-SignedHeaders=host&x-amz-checksum-mode=ENABLED&x-id=GetObject"
}
}
Use Cases
- Content Creation: Create consistent voiceovers for videos, podcasts, and presentations
- Personalization: Generate personalized audio messages and notifications
- Accessibility: Convert text to speech using familiar voices for better user experience
- Entertainment: Create character voices for games, animations, and interactive media
- Education: Develop educational content with consistent narrator voices
- Marketing: Create brand-consistent audio content and advertisements
- Audiobooks: Generate audiobook narration in specific voice styles
- Voice Assistants: Build custom voice assistants with unique personality voices
Tips for Best Results
- Quality Source Audio: Use clear, high-quality recordings with minimal background noise
- Speaker Duration: Provide at least 10-30 seconds of the target voice for better cloning quality
- Clean Audio: Enable
remove_background_noisefor recordings with background sounds - Single Speaker: Use audio samples containing only one speaker for best results
- Natural Speech: Source audio should contain natural conversational speech patterns
- File Size: Keep source audio files under 50MB for optimal processing speed
- Voice Settings: Start with default settings and adjust based on your needs:
- High
similarity_boost(0.7-0.9) for close voice matching - Medium
stability(0.4-0.7) for balanced expression - Low
style(0.0-0.2) for natural speech
- High
- Text Length: Break long texts into shorter segments for better quality
- Pronunciation: The cloned voice will follow the pronunciation patterns from the source audio