Deploy Voxtral Mini 3B with advanced audio AI capabilities
Run this powerful 3B parameter model with state-of-the-art audio input capabilities. Get speech transcription, translation, and voice function-calling with 32k context length.

Why Voxtral Mini delivers powerful audio AI capabilities
Advanced audio understanding
Process speech, audio, and voice inputs with dedicated transcription mode and built-in Q&A capabilities. Perfect for voice-first applications and audio analysis.
Multilingual voice support
Support for 8 languages with speech translation capabilities. Handle global voice applications with seamless language processing.
Function calling from voice
Direct function calling straight from voice inputs with long-form context support up to 32k tokens. Build sophisticated voice-controlled applications.
Built for modern audio AI applications

Speech transcription mode
Dedicated transcription capabilities for high-accuracy speech-to-text conversion across multiple languages and audio formats.
32k context length
Process long-form audio content and maintain context across extended conversations and complex audio documents.
Audio summarization
Built-in summarization capabilities for audio content, perfect for meeting transcripts, podcasts, and voice recordings.
Voice-to-function calling
Execute functions directly from voice commands, enabling hands-free operation and natural voice interfaces for applications.
Best-in-class text performance
Maintains exceptional text understanding while adding audio capabilities, ensuring versatility across all content types.
Real-time processing
Optimized for low-latency audio processing, perfect for live transcription, voice assistants, and interactive applications.
Perfect for voice-enabled applications
Voice assistants
Natural conversation interfaces
- Build sophisticated voice assistants with multilingual support, function calling, and long-form context understanding. Perfect for customer service, smart home, and business applications.
Meeting transcription
Real-time speech-to-text
- Transform meetings, interviews, and conferences into accurate transcripts with speaker identification, summarization, and multilingual support across 8 languages.
Audio content analysis
Podcast and media processing
- Analyze podcasts, videos, and audio content for insights, generate summaries, and extract key information with advanced audio understanding capabilities.
Accessibility solutions
Voice-to-text applications
- Create accessible applications with real-time transcription, voice navigation, and audio description capabilities for users with disabilities.
How Inference works
AI infrastructure optimized for audio processing with Voxtral Mini 3B
01
Choose your audio configuration
Select from pre-configured Voxtral Mini 3B instances optimized for speech processing, transcription, or voice-enabled applications.
02
Deploy in 3 clicks
Launch your private Voxtral Mini instance across our global infrastructure with smart routing optimized for low-latency audio processing.
03
Scale voice applications
Use your model with unlimited audio processing at a fixed monthly cost. Scale your voice applications without worrying about per-call API fees.
With Inference, you get enterprise-grade infrastructure management with specialized audio processing capabilities and complete deployment control.
Ready-to-use audio solutions
Voice application platform
Build and deploy voice-enabled applications with transcription, translation, and function calling capabilities.

Audio processing pipeline
Process large volumes of audio content with automated transcription, summarization, and multilingual analysis.

Real-time voice interface
Deploy interactive voice interfaces with function calling, context awareness, and natural conversation flow.

Frequently asked questions
What audio formats does Voxtral Mini 3B support?
Voxtral Mini 3B supports common audio formats including WAV, MP3, and FLAC. It's optimized for speech processing with dedicated transcription mode for high-accuracy conversion across multiple sampling rates and audio qualities.
How many languages are supported for voice processing?
Voxtral Mini 3B provides native multilingual support for 8 languages with built-in speech translation capabilities. This includes real-time transcription, translation, and voice understanding across supported languages.
Can I use voice commands to trigger functions in my application?
Yes, Voxtral Mini 3B supports function calling straight from voice inputs. You can build applications that execute specific functions based on voice commands, with full context awareness up to 32k tokens.
What's the difference between this and text-only models?
Voxtral Mini 3B enhances Ministral 3B with state-of-the-art audio input capabilities while retaining best-in-class text performance. You get both audio processing and excellent text understanding in a single model.
How does the 32k context length benefit audio applications?
The 32k context length allows processing of long-form audio content, maintaining conversation context across extended interactions, and handling complex audio documents like full meetings or lengthy recordings without losing context.
Deploy Voxtral Mini 3B today
Get advanced audio AI capabilities with speech transcription, multilingual support, and voice function-calling. Start with predictable pricing and unlimited usage.