API
The Gcore Customer Portal is being updated. Screenshots may not show the current version.
Edge Cloud
Edge Cloud
OverviewBillingTerraform
API
Chosen image
Home/Edge Cloud/Inference at the Edge

About Inference at the Edge

The development of machine learning involves two main stages: training and inference.

In the first stage, an AI model is trained on big data, like an array of images, to recognize and label objects. This results in a pre-trained model.

The second stage is model inference, where the model is used to make predictions from real user requests. For this stage, it’s crucial that the AI model can respond promptly to users regardless of network delays, latency, and distance from data centers.

Gcore GPU Cloud is designed for creating and training models. For inference, we offer Gcore Inference at the Edge.

What is Gcore Inference at the Edge?

Gcore Inference at the Edge allows customers to deploy pre-trained AI models on edge inference nodes. By bringing AI models closer to end users, the technology ensures ultra-fast response times and optimized performance.

Using Anycast endpoints, end users' queries are directed to the nearest running model, resulting in low latency and an enhanced user experience. This setup is automated through a single endpoint, relieving you of the need to manage, scale, and monitor the underlying infrastructure.

Getting started

For instructions on how to deploy AI models with the global intelligence pipeline, check out our guide on deploying a model.

Inference at the Edge is currently in beta mode. To join the beta, contact Gcore technical support or your account manager.

How Inference at the Edge works

Inference at the Edge combines two technologies:

1. Edge Network: Provides low latency via Anycast balancing and smart routing.

2. Serverless flexible GPU infrastructure: Enables quick initiation, integration, and deployment.

We provide you with an endpoint that can be integrated into your applications. When your users access this endpoint, their requests are delivered to the nearest Edge nodes. This is achieved through Smart Routing technology, which redirects requests to the closest inference region where the pre-trained model is deployed.

Diagram depicting Smart Routing technology

We also use Healthchecks to monitor the availability of pods. If the Amsterdam-1 pod is experiencing downtime, the request will be automatically sent to the geographically closest inference region, such as Amsterdam-2.

<Map depicting Smart Routing across locations

Use cases

Inference at the Edge is a versatile solution for businesses that require low-latency or real-time model responses. It caters to various industries, including:

  • Fintech and banking: Enables prompt anti-fraud detection and real-time credit scoring.

  • Healthcare: Facilitates medical prescriptions based on data from wearable sensors and the analysis of medical data

  • Gaming: Supports automatic opponent selection in competitive games, map generation, and maintaining open worlds.

  • Media: Provides content analysis, automated transcribing, and translating of interviews.

  • ISP and internet services: Offers AI-based traffic analysis and DDoS protection.

  • Industrial and manufacturing: Ensures real-time defect detection and fast feedback.

Key benefits

Inference at the Edge offers several key benefits:

  • Low latency: With over 160 points of presence worldwide, requests are transferred quickly to the nearest Inference at the Edge pod, ensuring low latency for users.

  • Flexibility in model selection. Run leading open-source models from our model catalog or deploy your own custom models.

  • High performance: Utilizing the latest, purpose-built NVIDIA GPU hardware, Inference at the Edge delivers fast model inference capable of handling the most demanding workloads.

  • Cost efficiency: Payments are based solely on the runtime of the containers, which automatically scale in and out based on the number of user requests to keep your operations cost-effective.

  • Easy control: Global AI infrastructure can be configured with just a few clicks in the Gcore Customer Portal or by API requests, simplifying management and control.

Supported features

  • Model catalog

  • Custom model deployment

  • Various flavors (vGPU/vCPU/RAM) and storage

  • DDoS and bot protection

  • API keys

  • REST API & Terraform (coming soon)

  • RAG support (coming soon)

AI models

The following are the foundational open-source models available in our AI model catalog.

Model Description
DistilBERT A light version of the BERT language model for generating short text extracts.
LLaMA-Pro A large language model (LLM) for understanding general language and domain-specific areas, particularly programming and mathematics.
Mistral-7B An LLM that can generate human-quality text, write code, summarize text, and answer questions.
ResNet-50 A deep learning neural model used in computer vision tasks and known for its ability to train networks effectively.
Stable Diffusion XL 1 model for generating images based on text descriptions.
Whisper An automatic speech recognition model for converting spoken language into written text.

Was this article helpful?

Not a Gcore user yet?

Discover our offerings, including virtual instances starting from 3.7 euro/mo, bare metal servers, AI Infrastructure, load balancers, Managed Kubernetes, Function as a Service, and Centralized Logging solutions.

Go to the product page