Deploy Voxtral Mini 3B with advanced audio AI capabilities

Run this powerful 3B parameter model with state-of-the-art audio input capabilities. Get speech transcription, translation, and voice function-calling with 32k context length.

Deploy now

Deploy Voxtral Mini 3B with advanced audio AI capabilities

Why Voxtral Mini delivers powerful audio AI capabilities

Advanced audio understanding

Process speech, audio, and voice inputs with dedicated transcription mode and built-in Q&A capabilities. Perfect for voice-first applications and audio analysis.

Multilingual voice support

Support for 8 languages with speech translation capabilities. Handle global voice applications with seamless language processing.

Function calling from voice

Direct function calling straight from voice inputs with long-form context support up to 32k tokens. Build sophisticated voice-controlled applications.

Built for modern audio AI applications

Voxtral Mini 3B on Inference combines the best text performance with cutting-edge audio capabilities.

Speech transcription mode

Dedicated transcription capabilities for high-accuracy speech-to-text conversion across multiple languages and audio formats.

32k context length

Process long-form audio content and maintain context across extended conversations and complex audio documents.

Audio summarization

Built-in summarization capabilities for audio content, perfect for meeting transcripts, podcasts, and voice recordings.

Voice-to-function calling

Execute functions directly from voice commands, enabling hands-free operation and natural voice interfaces for applications.

Best-in-class text performance

Maintains exceptional text understanding while adding audio capabilities, ensuring versatility across all content types.

Real-time processing

Optimized for low-latency audio processing, perfect for live transcription, voice assistants, and interactive applications.

Perfect for voice-enabled applications

Voice assistants

Natural conversation interfaces

Build sophisticated voice assistants with multilingual support, function calling, and long-form context understanding. Perfect for customer service, smart home, and business applications.

Meeting transcription

Real-time speech-to-text

Transform meetings, interviews, and conferences into accurate transcripts with speaker identification, summarization, and multilingual support across 8 languages.

Audio content analysis

Podcast and media processing

Analyze podcasts, videos, and audio content for insights, generate summaries, and extract key information with advanced audio understanding capabilities.

Accessibility solutions

Voice-to-text applications

Create accessible applications with real-time transcription, voice navigation, and audio description capabilities for users with disabilities.

How Inference works

AI infrastructure optimized for audio processing with Voxtral Mini 3B

Choose your audio configuration

Select from pre-configured Voxtral Mini 3B instances optimized for speech processing, transcription, or voice-enabled applications.

Deploy in 3 clicks

Launch your private Voxtral Mini instance across our global infrastructure with smart routing optimized for low-latency audio processing.

Scale voice applications

Use your model with unlimited audio processing at a fixed monthly cost. Scale your voice applications without worrying about per-call API fees.

With Inference, you get enterprise-grade infrastructure management with specialized audio processing capabilities and complete deployment control.

Ready-to-use audio solutions

Voice application platform

Build and deploy voice-enabled applications with transcription, translation, and function calling capabilities.

Audio processing pipeline

Process large volumes of audio content with automated transcription, summarization, and multilingual analysis.

Real-time voice interface

Deploy interactive voice interfaces with function calling, context awareness, and natural conversation flow.

Frequently asked questions

What audio formats does Voxtral Mini 3B support?

Voxtral Mini 3B supports common audio formats including WAV, MP3, and FLAC. It's optimized for speech processing with dedicated transcription mode for high-accuracy conversion across multiple sampling rates and audio qualities.

How many languages are supported for voice processing?

Voxtral Mini 3B provides native multilingual support for 8 languages with built-in speech translation capabilities. This includes real-time transcription, translation, and voice understanding across supported languages.

Can I use voice commands to trigger functions in my application?

Yes, Voxtral Mini 3B supports function calling straight from voice inputs. You can build applications that execute specific functions based on voice commands, with full context awareness up to 32k tokens.

What's the difference between this and text-only models?

Voxtral Mini 3B enhances Ministral 3B with state-of-the-art audio input capabilities while retaining best-in-class text performance. You get both audio processing and excellent text understanding in a single model.

How does the 32k context length benefit audio applications?

The 32k context length allows processing of long-form audio content, maintaining conversation context across extended interactions, and handling complex audio documents like full meetings or lengthy recordings without losing context.

Deploy Voxtral Mini 3B today

Get advanced audio AI capabilities with speech transcription, multilingual support, and voice function-calling. Start with predictable pricing and unlimited usage.

Start deployment