Skip to main content

ElevenLabs Voice Cloning

Clone any voice from an audio sample and generate speech with that cloned voice using ElevenLabs' advanced AI voice cloning technology. Upload an audio file containing the voice you want to clone, provide text to convert to speech, and receive high-quality audio output in the cloned voice.

Note: Audio files must first be uploaded using the Asset API before voice cloning. The audioUrl parameter should contain the path returned from the Asset API upload.

Supported Models

  • elevenlabs-voice-cloning: ElevenLabs Voice Cloning service with Eleven Flash v2.5

Parameters

ParameterTypeRequiredDescription
typestringYesFeature type, must be "VOICE_CLONING"
modelstringYesModel identifier, use "elevenlabs-voice-cloning"
promptObject.audioUrlstringYesPath to audio file containing voice to clone (uploaded via Asset API)
promptObject.textstringYesText to convert to speech using the cloned voice
promptObject.output_formatstringNoOutput audio format (default: "mp3_44100_128")
promptObject.model_idstringNoElevenLabs model ID (default: "eleven_flash_v2_5")
promptObject.language_codestringNoLanguage code for the voice (default: "en")
promptObject.remove_background_noisebooleanNoRemove background noise from source audio (default: false)
promptObject.voice_settings.stabilitynumberNoVoice stability (0.0-1.0, default: 0.5)
promptObject.voice_settings.similarity_boostnumberNoVoice similarity boost (0.0-1.0, default: 0.75)
promptObject.voice_settings.stylenumberNoVoice style exaggeration (0.0-1.0, default: 0.0)
promptObject.voice_settings.use_speaker_boostbooleanNoUse speaker boost for better clarity (default: true)

Endpoint

Request Headers

FieldValue
API-KEY<api-key>
Content-Typeapplication/json

Supported Audio Formats

Input Formats (Voice Source)

  • MP3 - MPEG Audio Layer III
  • WAV - Waveform Audio File Format
  • M4A - MPEG-4 Audio
  • FLAC - Free Lossless Audio Codec
  • MP4 - MPEG-4 Part 14 (audio only)
  • WEBM - WebM Audio
  • OGG - Ogg Vorbis

Output Formats

  • mp3_44100_128 - MP3, 44.1kHz, 128kbps (default)
  • mp3_44100_64 - MP3, 44.1kHz, 64kbps
  • mp3_44100_96 - MP3, 44.1kHz, 96kbps
  • mp3_44100_192 - MP3, 44.1kHz, 192kbps
  • mp3_22050_32 - MP3, 22.05kHz, 32kbps
  • pcm_16000 - PCM, 16kHz
  • pcm_22050 - PCM, 22.05kHz
  • pcm_24000 - PCM, 24kHz
  • pcm_44100 - PCM, 44.1kHz

Supported Models

ElevenLabs Voice Models

  • eleven_flash_v2_5 - Eleven Flash v2.5 (default, fastest)
  • eleven_turbo_v2_5 - Eleven Turbo v2.5 (balanced speed and quality)
  • eleven_multilingual_v2 - Eleven Multilingual v2 (supports multiple languages)

Language Support

The API supports automatic language detection for voice cloning and text-to-speech conversion in multiple languages:

  • en - English
  • es - Spanish
  • fr - French
  • de - German
  • it - Italian
  • pt - Portuguese
  • ru - Russian
  • ja - Japanese
  • ko - Korean
  • zh - Chinese
  • ar - Arabic
  • hi - Hindi

And many more languages supported by ElevenLabs.

Voice Settings Explained

Stability (0.0 - 1.0)

  • Low (0.0-0.3): More variable and expressive, but may be inconsistent
  • Medium (0.4-0.7): Balanced stability and expressiveness (recommended)
  • High (0.8-1.0): Very stable but may sound monotone

Similarity Boost (0.0 - 1.0)

  • Low (0.0-0.3): More creative interpretation of the voice
  • Medium (0.4-0.7): Balanced similarity to original voice
  • High (0.8-1.0): Maximum similarity to the source voice (recommended)

Style (0.0 - 1.0)

  • Low (0.0): Natural speech patterns (recommended for most use cases)
  • High (1.0): Exaggerated style and emotion

Speaker Boost

  • Enabled: Enhances speaker similarity and audio clarity (recommended)
  • Disabled: Standard processing without additional enhancement

Code Example

curl --location 'https://api.1min.ai/api/features' \
--header 'API-KEY: <api-key>' \
--header 'Content-Type: application/json' \
--data '{
"type": "VOICE_CLONING",
"model": "elevenlabs-voice-cloning",
"promptObject": {
"audioUrl": "audios/2025_10_21_10_25_35_749_short.mp3",
"text": "Hello, this is a test of voice cloning technology. The AI has learned to speak in my voice.",
"output_format": "mp3_22050_32",
"model_id": "eleven_flash_v2_5",
"language_code": "en",
"remove_background_noise": true,
"voice_settings": {
"stability": 0.5,
"similarity_boost": 0.75,
"style": 0.0,
"use_speaker_boost": true
}
}
}'

Interactive Playground

Try the API directly in your browser:

API Playground

https://api.1min.ai/api/features
Path to the audio file containing the voice to clone (upload via Asset API first)
The text that will be spoken in the cloned voice
Clean up background noise from the source audio
Voice stability - higher values are more stable but less expressive
How closely to match the original voice - higher values for better similarity
Style exaggeration - 0.0 for natural speech, higher for more dramatic delivery
Enhance speaker similarity and audio clarity

Generated cURL Command:

curl -X POST "https://api.1min.ai/api/features" \
-H "API-KEY: <your-api-key>" \
-H "Content-Type: application/json" \
-d '{
"type": "VOICE_CLONING",
"model": "elevenlabs-voice-cloning",
"promptObject": {
"audioUrl": "audios/2025_10_21_10_25_35_749_short.mp3",
"model_id": "eleven_flash_v2_5",
"text": "Hello, this is a test of voice cloning technology.",
"outputFormat": "mp3_44100_128",
"remove_background_noise": true,
"stability": 0.5,
"similarity_boost": 0.75,
"style": 0,
"use_speaker_boost": true
}
}'

Response Format

The API returns an audio file path that can be accessed via the Asset API:

{
"aiRecord": {
"uuid": "ab6fc10b-53f6-46d5-9c43-119723922138",
"userId": "c937fbcc-fa8f-4565-a440-c4d87f56fcb2",
"teamId": "a4e176b2-dabb-451e-9c58-62b451fa9630",
"teamUser": {
"teamId": "a4e176b2-dabb-451e-9c58-62b451fa9630",
"userId": "c937fbcc-fa8f-4565-a440-c4d87f56fcb2",
"userName": "John Doe",
"userAvatar": "https://lh3.googleusercontent.com/a/ACg8ocLqgsNsHRfmWF9d-E1RvJetVsEzxNOsOg-NXWNTpMxLDPJbwELI=s96-c",
"status": "ACTIVE",
"role": "ADMIN",
"creditLimit": 100000000,
"usedCredit": 324208,
"createdAt": "2025-10-20T04:13:40.847Z",
"createdBy": "SYSTEM",
"updatedAt": "2025-10-21T10:36:11.166Z",
"updatedBy": "SYSTEM"
},
"model": "elevenlabs-voice-cloning",
"type": "VOICE_CLONING",
"metadata": null,
"rating": null,
"feedback": null,
"conversationId": null,
"status": "SUCCESS",
"createdAt": "2025-10-21T10:39:31.287Z",
"aiRecordDetail": {
"promptObject": {
"text": "Hello, this is a test of voice cloning technology.",
"style": 0,
"audioUrl": "audios/2025_10_21_10_25_35_749_short.mp3",
"model_id": "eleven_flash_v2_5",
"stability": 0.5,
"outputFormat": "mp3_44100_128",
"similarity_boost": 0.75,
"use_speaker_boost": true,
"remove_background_noise": true
},
"resultObject": [
"development/audios/2025_10_21_17_39_38_149_155254.mp3"
],
"responseObject": {}
},
"additionalData": null,
"temporaryUrl": "https://s3.us-east-1.amazonaws.com/asset.1min.ai/development/audios/2025_10_21_17_39_38_149_155254.mp3?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=AKIAVRUVQEFIHSKAXGE7%2F20251021%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20251021T103939Z&X-Amz-Expires=604800&X-Amz-Signature=e89abbc8095df3d3a857cd7840ba96f0ef0559dcadf7fc11be5393dc5e8fc308&X-Amz-SignedHeaders=host&x-amz-checksum-mode=ENABLED&x-id=GetObject"
}
}

Use Cases

  • Content Creation: Create consistent voiceovers for videos, podcasts, and presentations
  • Personalization: Generate personalized audio messages and notifications
  • Accessibility: Convert text to speech using familiar voices for better user experience
  • Entertainment: Create character voices for games, animations, and interactive media
  • Education: Develop educational content with consistent narrator voices
  • Marketing: Create brand-consistent audio content and advertisements
  • Audiobooks: Generate audiobook narration in specific voice styles
  • Voice Assistants: Build custom voice assistants with unique personality voices

Tips for Best Results

  1. Quality Source Audio: Use clear, high-quality recordings with minimal background noise
  2. Speaker Duration: Provide at least 10-30 seconds of the target voice for better cloning quality
  3. Clean Audio: Enable remove_background_noise for recordings with background sounds
  4. Single Speaker: Use audio samples containing only one speaker for best results
  5. Natural Speech: Source audio should contain natural conversational speech patterns
  6. File Size: Keep source audio files under 50MB for optimal processing speed
  7. Voice Settings: Start with default settings and adjust based on your needs:
    • High similarity_boost (0.7-0.9) for close voice matching
    • Medium stability (0.4-0.7) for balanced expression
    • Low style (0.0-0.2) for natural speech
  8. Text Length: Break long texts into shorter segments for better quality
  9. Pronunciation: The cloned voice will follow the pronunciation patterns from the source audio

Error Handling

Common error scenarios and solutions:

  • File not found: Ensure the source audio file was uploaded via Asset API first
  • Invalid audioUrl: Verify the path matches exactly what was returned from Asset API upload
  • Poor cloning quality: Try using cleaner source audio or adjusting voice settings
  • Voice creation failed: Check that the source audio contains clear speech from a single speaker
  • Text too long: Break long texts into smaller chunks for better processing

Rate Limits

  • Voice cloning operations are resource-intensive and may have lower rate limits
  • Consider implementing queuing for bulk voice cloning operations
  • Monitor your usage to avoid hitting concurrent processing limits