Deploy Voxtral Small 24B with advanced audio AI capabilities
Run Mistral's enhanced 24B model with state-of-the-art speech processing on our cloud infrastructure. Get voice transcription, multilingual support, and function calling with complete data privacy.

Why Voxtral Small delivers advanced audio AI with text excellence
Advanced audio understanding
Process speech with dedicated transcription mode, long-form context handling up to 32k tokens, and built-in Q&A capabilities for comprehensive audio analysis.
Native multilingual support
Handle 8 languages seamlessly with voice-to-text transcription, translation capabilities, and cultural context awareness for global applications.
Function calling from voice
Execute functions directly from voice commands while maintaining best-in-class text performance. Perfect for voice-activated workflows and interactive applications.
Built for advanced audio AI and enterprise text processing

Speech transcription mode
Dedicated transcription capabilities with high accuracy for voice-to-text conversion and long-form audio processing.
32k token context length
Handle extensive conversations, long documents, and complex audio sessions with extended context understanding.
Built-in Q&A and summarization
Extract insights from audio content with native question-answering and automatic summarization capabilities.
Voice function calling
Execute complex workflows directly from voice commands with structured function integration and response handling.
Best-in-class text performance
Retain Mistral Small's exceptional text understanding while adding powerful audio processing capabilities.
Audio understanding engine
Advanced audio comprehension with context awareness, emotion detection, and multi-speaker recognition capabilities.
Perfect for voice-enabled and multimedia applications
Voice assistants
Intelligent conversational AI
- Build sophisticated voice assistants with multilingual support, context retention, and function calling capabilities for customer service, smart homes, and enterprise applications.
Content transcription
Professional audio processing
- Transform podcasts, meetings, interviews, and multimedia content into searchable text with speaker identification, timestamps, and automatic summarization.
Multilingual applications
Global communication solutions
- Deploy across 8 languages with native understanding, cultural context awareness, and real-time translation capabilities for international businesses.
Interactive workflows
Voice-controlled automation
- Enable hands-free operation with voice-triggered functions, complex workflow execution, and intelligent response generation for productivity applications.
How Inference works with Voxtral Small
Audio AI infrastructure optimized for speech processing and text understanding
01
Select audio AI configuration
Choose from pre-configured Voxtral Small 24B instances optimized for speech processing, transcription, and multilingual applications.
02
Deploy across global infrastructure
Launch your private Voxtral instance with smart routing optimized for low-latency audio processing and real-time transcription.
03
Scale voice applications unlimited
Process unlimited audio requests at fixed monthly cost. Scale your voice applications without worrying about per-call API fees.
With Inference, you get enterprise-grade infrastructure management while maintaining complete control over your audio AI deployment.
Ready-to-use audio AI solutions
Voice transcription platform
Build comprehensive speech-to-text solutions with multilingual support, speaker identification, and automated summarization capabilities.

Interactive voice applications
Deploy voice-controlled workflows with function calling, context retention, and intelligent response generation for hands-free operation.

Multimedia content analysis
Process audio and text content simultaneously with Q&A extraction, summarization, and multilingual understanding for content platforms.

Frequently asked questions
How does Voxtral Small enhance Mistral Small with audio capabilities?
Voxtral Small 24B builds upon Mistral Small's exceptional text performance by adding state-of-the-art audio input capabilities. You get dedicated transcription mode, long-form context handling, built-in Q&A, and function calling directly from voice while maintaining best-in-class text understanding.
What audio processing capabilities are included?
Voxtral includes speech transcription, audio understanding with context awareness, multilingual support for 8 languages, automatic summarization of audio content, Q&A extraction from speech, and voice-triggered function calling with structured responses.
How does the 32k token context length benefit audio applications?
The extended context allows processing of long-form audio content like podcasts, meetings, or interviews while maintaining conversation history. This enables better understanding of context, speaker relationships, and topic continuity across extended sessions.
Can I use function calling directly from voice commands?
Yes, Voxtral supports native function calling triggered by voice input. You can execute complex workflows, API calls, and structured operations directly from speech while maintaining the full context and generating intelligent responses.
What languages are supported for audio processing?
Voxtral Small provides native multilingual support for 8 languages with voice-to-text transcription, translation capabilities, and cultural context understanding. This makes it ideal for global applications requiring multilingual audio processing.
Deploy Voxtral Small 24B today
Get advanced audio AI capabilities with enterprise text performance. Start with predictable pricing and unlimited usage for your voice applications.