We’re thrilled to introduce Gcore Inference at the Edge. This new solution reduces the latency of your ML model output and improves the performance of AI-enabled applications. It’s built on Gcore’s global network of 180+ edge points of presence (PoPs) powered by NVIDIA L40S GPUs. Inference at the Edge particularly benefits latency-sensitive, real-time applications, including generative AI and object recognition. Inference at the Edge is currently in beta mode and is free to use. Read on to learn more about the solution’s features, use cases, and how to get started.
What Is Gcore Inference at the Edge?
Gcore Inference at the Edge enables you to deploy ML models on edge points of presence. Anycast endpoints route end-user queries to the nearest running model for low latency, resulting in a seamless user experience.
There’s no need to manage, scale, and monitor the underlying infrastructure; the setup is completely automated on our side. So, you get a single endpoint to integrate into your application.
Inference at the Edge is based on three components:
- Our low-latency network of over 180 edge PoPs in 90+ countries with Smart Routing and an average network latency of 30 ms
- NVIDIA L40S GPUs deployed on Gcore edge PoPs
- Gcore’s model catalog, which offers popular, open-source foundational ML models, including Mistral 7B, Stable-Diffusion XL, and LLaMA Pro 8B
How Does Gcore Inference at the Edge Work?
We provide you with a single endpoint for your applications. When end users access this endpoint, their requests are delivered to the edge PoPs closest to them.
Here’s an example of how the service works for end users:
A user query and a model output result can be handled in two ways:
- Basic query-result route: When a user sends a query, an edge node defines the route to the closest available inference region with the lowest latency.
- Alternative query-result route: If the nearest region is unavailable, its edge node will redirect the user’s query to the next closest region.
Why Choose Gcore Inference at the Edge?
Inference at the Edge offers several benefits for AI application developers who want to optimize AI inference and improve user experience.
- High performance: The service accelerates the time taken for a query and model response to pass through the network, reducing it to an average of 30 ms.
- Scalability: Automatically scale your ML model up and down either in a specific region or in all selected regions.
- Cost efficient: Pay only for the resources your ML model uses. Set autoscaling limits to control how many resources your models use during peak loads.
- Quick time-to-market: By delegating infrastructure management to the Gcore team, your engineers save valuable time and can focus on core tasks.
- Easy to use: Inference at the Edge provides an intuitive developer workflow for fast and streamlined development and deployment.
- Enterprise ready: The service provides integrated security with built-in DDoS protection for endpoints and local data processing to help ensure your data privacy and sovereignty.
Use Cases for Inference at the Edge
Inference at the Edge can be used across industries. Here are just a few examples of potential use cases:
Technology | Gaming | Retail | Media and entertainment |
Generative AI applications Chatbots and virtual assistants AI tools for software engineers Data augmentation | AI content and map generation Real-time AI bot customization and conversation Real-time player analytics | Smart grocery with self-checkout and merchandising Content generation, predictions, and recommendations Virtual try-on | Content analysis Automated transcribing Real-time translation |
How to Get Started
While in beta, Gcore Inference at the Edge is available by request. If you’d like to try it out, please submit our contact form or, if you’re already a Gcore customer, your account manager.
Once you have access, explore our product documentation to get started:
- Deploy an AI model
- Add and configure a registry
- Create and manage API keys
- Manage deployments in the Gcore Customer Portal
Conclusion
Gcore Inference at the Edge is a powerful and efficient solution for serving your ML model(s) and improving end-user experiences. It provides low latency and high throughput for your ML models, built-in DDoS protection, popular foundational models, and other features essential for production-grade AI inference at the edge.
If you’d like a personalized consultation or assistance with the product, please get in touch.