2024 has been an exceptional year for advancements in artificial intelligence (AI). The variety of models has grown significantly, with impressive strides in performance across domains. Whether it’s text or image classification, text and image generation, speech models, or multimodal capabilities, businesses now face the challenge of navigating an ever-expanding catalog of open-source models. Understanding the differences in tasks and metrics targeted by these models is crucial to making informed decisions.
At Gcore, we’ve been expanding our model catalog to simplify AI model testing and deployment. As businesses scale their AI applications across various units, identifying the best model for specific tasks becomes critical. For example, some applications, like cancer screening, prioritize accuracy over latency. On the other hand, time-sensitive use cases like fraud detection demand rapid processing, while cost may drive decisions for lightweight applications like chatbot development.
This guide provides a comprehensive overview of the AI models supported on the Gcore platform, their characteristics, and their most effective use cases to help you choose the right model for your needs. Our inference solution also supports custom AI models.
Large language models (LLMs)
LLMs are foundational for applications requiring human-like understanding and generation of text, making them crucial for customer service, research, and educational tools. These models are versatile and cover a range of applications:
- Text generation (e.g., creative writing, content creation)
- Summarization
- Question answering
- Instruction following (specific to instruct-tuned models)
- Sentiment analysis
- Translation
- Code generation and debugging (if fine-tuned for programming tasks)
Models supported by Gcore
Gcore supports the following models for inference, available in the Gcore Customer Portal. Activate them at the click of a button.
Model name | Provider | Parameters | Key characteristics |
LLaMA-Pro-8B | Meta AI | 8 Billion | Balanced trade-off between cost and power, suitable for real-time applications. |
Llama-3.2-1B-Instruct | Meta AI | 1 Billion | Ideal for lightweight tasks with minimal computational needs. |
Llama-3.2-3B-Instruct | Meta AI | 3 Billion | Offers lower latency for moderate task complexity. |
Llama-3.1-8B-Instruct | Meta AI | 8 Billion | Optimized for instruction following. |
Mistral-7B-Instruct-v0.3 | Mistral AI | 7 Billion | Excellent for nuanced instruction-based responses. |
Mistral-Nemo-Instruct-2407 | Mistral AI & Nvidia | 7 Billion | High efficiency with robust instruction-following capabilities. |
Qwen2.5-7B-Instruct | Qwen | 7 Billion | Excels in multilingual tasks and general-purpose applications. |
QwQ-32B-Preview | Qwen | 32 Billion | Suited for complex, multi-turn conversations and strategic decision-making. |
Marco-o1 | AIDC-AI | 1-5 Billion (est.) | Designed for structured and open-ended problem-solving tasks. |
Business applications
LLMs play a pivotal role in various business scenarios; choosing the right model will be primarily influenced by task complexity. For lightweight tasks like chatbot development and FAQ automation, models like Llama-3.2-1B-Instruct are highly effective. Medium complexity tasks, including document summarization and multilingual sentiment analysis, can leverage models like Llama-3.2-3B-Instruct and Qwen2.5-7B-Instruct. For high-performance needs like real-time customer service or healthcare diagnostics, models like LLaMA-Pro-8B and Mistral-Nemo-Instruct-2407 provide robust solutions. Complex, large-scale applications, like market forecasting and legal document synthesis, are ideally suited for advanced models like QwQ-32B-Preview. Additionally, specialized solutions for niche industries can benefit from Marco-o1’s unique capabilities.
Image generation
Image generation models empower industries like entertainment, advertising, and e-commerce to create engaging content that captures the audience’s attention. These models excel in producing creative and high-quality visuals. Key tasks include:
- Generating photorealistic images
- Artistic rendering (e.g., illustrations, concept art)
- Image enhancement (e.g., super-resolution, inpainting)
- Marketing and branding visuals
Models supported by Gcore
We currently support six models via the Gcore Customer Portal, or you can bring your own image generation model to our inference platform.
Model name | Provider | Parameters | Key characteristics |
ByteDance/SDXL-Lightning | ByteDance | 100-400 Million | Lightning-fast text-to-image generation with 1024px outputs. |
stable-cascade | Stability AI | 20M-3.6 Billion | Works on smaller latent spaces for faster and cheaper inference. |
stable-diffusion-xl | Stability AI | ~3.5B Base + 1.2B Refinement | Photorealistic outputs with detailed composition. |
stable-diffusion-3.5-large-turbo | Stability AI | 8 Billion | Balances high-quality outputs with faster inference. |
FLUX.1-schnell | Black Forest Labs | 12 Billion | Designed for fast, local development. |
FLUX.1-dev | Black Forest Labs | 12 Billion | Open-weight model for non-commercial applications. |
Business applications
In high-quality image generation, models like stable-diffusion-xl and stable-cascade are commonly employed for creating marketing visuals, concept art for gaming, and detailed e-commerce product visualizations. Real-time applications, such as AR/VR customizations and interactive customer tools, benefit from the speed of ByteDance/SDXL-Lightning and FLUX.1-schnell. FLUX.1-dev and stable-diffusion-3.5-large-turbo are excellent options for experimentation and development, allowing startups and enterprises to prototype generative AI workflows cost-effectively. Specialized use cases, such as ultra-high-quality visuals for luxury goods or architectural renders, also find tailored solutions with stable-cascade.
Speech recognition
Speech recognition models are essential for industries like media, healthcare, and education, where transcription accuracy and speed directly impact their efficacy. They facilitate:
- Accurate speech-to-text transcription
- Low-latency live audio conversion
- Multilingual speech processing and translation
- Automated note-taking and content creation
Models supported by Gcore
At Gcore, our inference service supports two Whisper models, as well as custom speech recognition models.
Model name | Provider | Parameters | Key characteristics |
whisper-large-v3-turbo | OpenAI | 809 Million | Optimized for speed with minimal accuracy trade-offs. |
whisper-large-v3 | OpenAI | 1.55 Billion | High-quality multilingual speech-to-text and translation with reduced error rates. |
Business applications
Speech recognition technology supports a wide range of business functions, all requiring precision and accuracy, delivered at speed. For real-time transcription, whisper-large-v3-turbo is ideal for live captioning and speech analytics applications. High-accuracy tasks, including legal transcription, academic research, and multilingual content localization, leverage the advanced capabilities of whisper-large-v3. These models enable faster, more accurate workflows in sectors where precise audio-to-text conversion is crucial.
Multimodal models
By bridging text, image, and other data modalities, multimodel models unlock innovative solutions for industries requiring complex data analysis. These models integrate diverse data types for applications in:
- Image captioning
- Visual question answering
- Multilingual document processing
- Robotic vision
Models supported by Gcore
We currently support the following multimodal models:
Model name | Provider | Parameters | Key characteristics |
Pixtral-12B-2409 | Mistral AI | 12 Billion | Excels in instruction-following tasks with text and image integration. |
Qwen2-VL-7B-Instruct | Qwen | 7 Billion | Advanced visual understanding and multilingual support. |
Business applications
For tasks like image captioning and visual question answering, Pixtral-12B-2409 provides robust capabilities in generating descriptive text and answering questions based on visual content. Qwen2-VL-7B-Instruct supports document analysis and robotic vision, enabling systems to extract insights from documents or understand their physical surroundings. These applications are transformative for industries ranging from digital media to robotics.
A multitude of models, supported by Gcore
Start developing on the Gcore platform today, leveraging top-tier GPUs for seamless AI model training and deployment. Simplify large-scale, cross-regional AI operations with our inference-at-the-edge solutions, backed by over a decade of CDN expertise.