Audio & Speech API Pricing

79 models — TTS, ASR, and realtime speech

This page tracks 79 audio and speech AI models. The lowest current price is $0.0025/min from Universal-2. 43 sources auto-scraped; 36 manually maintained. Covers TTS, ASR, and Realtime — click Detail → for per-provider comparisons.

ModelFromDetail
Universal-2
$0.0025/min
Detail →
Universal Streaming
$0.0025/min
Detail →
Universal-3 Pro
$0.0035/min
Detail →
Nova-2
$0.0043/min
Detail →
Nova-3
$0.0043/min
Detail →
Whisper Large
$0.0048/min
Detail →
Gemini 2.5 Flash TTS
$0.0050/$/1K chars
Detail →
Whisper 1
$0.0060/$/minute
Detail →
suno_uploads
$0.0068/req
Detail →
Universal-3 Pro Streaming
$0.0075/min
Detail →
Speech 2.8 Turbo
$0.0080/$/1K chars
Detail →
Kling Audio
$0.0100/$/second
Detail →
audio1.0
$0.0109/req
Detail →
speech-02-hd
$0.0109/req
Detail →
speech-02-turbo
$0.0109/req
Detail →
speech-2.6-hd
$0.0109/req
Detail →
speech-2.6-turbo
$0.0109/req
Detail →
speech-2.8-hd
$0.0109/req
Detail →
speech-2.8-turbo
$0.0109/req
Detail →
MiniMax Speech 02 HD
$0.0120/$/1K chars
Detail →
Base
$0.0125/min
Detail →
Speech 2.8 HD
$0.0140/$/1K chars
Detail →
Enhanced
$0.0145/min
Detail →
Nova-2 Medical
$0.0148/min
Detail →
Aura-2
$0.0150/1K chars
Detail →
Aura Asteria
$0.0150/1K chars
Detail →
Aura Luna
$0.0150/1K chars
Detail →
Aura Stella
$0.0150/1K chars
Detail →
TTS-1
$0.0150/$/1K chars
Detail →
suno_concat_open
$0.0182/req
Detail →
suno_lyrics_open
$0.0182/req
Detail →
suno_music_open
$0.0182/req
Detail →
suno_persona_open
$0.0182/req
Detail →
suno_upload_open
$0.0182/req
Detail →
suno_upsample_open
$0.0182/req
Detail →
suno_upsample-tags
$0.0182/req
Detail →
Sonic Turbo
$0.0300/1K chars
Detail →
TTS-1 HD
$0.0300/$/1K chars
Detail →
Kling Advanced Lip Sync
$0.0500/$/second
Detail →
Suno Music
$0.0500/$/song
Detail →
Sonic 2
$0.0650/1K chars
Detail →
Sonic 2 Preview
$0.0650/1K chars
Detail →
Sonic English
$0.0650/1K chars
Detail →
Sonic Multilingual
$0.0650/1K chars
Detail →
Eleven Flash v2.5
$0.0800/1K chars
Detail →
GPT-4o Audio
$0.1000/$/minute
Detail →
Eleven Turbo v2.5
$0.1500/1K chars
Detail →
gpt-4o-mini-tts
$0.2048/req
Detail →
Eleven English v2
$0.3000/1K chars
Detail →
Eleven Monolingual v1
$0.3000/1K chars
Detail →
Eleven Multilingual v1
$0.3000/1K chars
Detail →
Eleven Multilingual v2
$0.3000/1K chars
Detail →
gemini-2.5-flash-preview-tts
Detail →
gemini-2.5-pro-preview-tts
Detail →
gemini-3.1-flash-tts-preview
Detail →
Google Veo 3.0 + Audio
Detail →
Google Veo 3.0 Fast + Audio
Detail →
gpt-4o-audio-preview
Detail →
gpt-4o-audio-preview-2024-12-17
Detail →
gpt-4o-audio-preview-2025-06-03
Detail →
gpt-4o-mini-audio-preview
Detail →
gpt-4o-mini-audio-preview-2024-12-17
Detail →
gpt-4o-mini-tts-1
Detail →
gpt-4o-mini-tts-2025-03-20
Detail →
gpt-audio
Detail →
gpt-audio-2025-08-28
Detail →
Kokoro-82M TTS
Detail →
GPT Audio
Detail →
GPT Audio Mini
Detail →
qwen3-tts-flash
Detail →
qwen3-tts-flash-2025-11-27
Detail →
tts-1
Detail →
tts-1-1106
Detail →
tts-1-hd
Detail →
tts-1-hd-1106
Detail →
tts-hd-1
Detail →
whisper-1
Detail →
Whisper Large v3
Detail →
Whisper Large v3 (Streaming)
Detail →

Popular Models

Universal-2Universal StreamingUniversal-3 ProNova-2Nova-3Whisper Large

FAQ

What types of audio AI APIs are available?

Three main categories: TTS (text-to-speech), ASR (speech recognition), and Realtime APIs for live voice interaction. This page covers 79 models with prices manually maintained from official provider pages.

How is audio API pricing calculated?

TTS is billed per 1K or 1M characters; ASR per minute of audio processed; Realtime per session-minute. Click "Detail →" on any model to see exact per-provider rates.

Which TTS or ASR API is best value?

Currently the lowest price is $0.0025/min from Universal-2. The best value depends on your volume and quality requirements — click "Detail →" to compare all providers for each model.

Other API categories

🎬 Video Generation🖼️ Image Generation🤖 LLM
← Back to AI API Pricing