API
The Gcore Customer Portal is being updated. Screenshots may not show the current version.
Edge Cloud
Edge Cloud
OverviewBillingTerraform
API
Chosen image
Home/Edge Cloud/Inference at the Edge/Deploy AI models/

Deploy Hugging Face models on edge inference nodes

Deploy Hugging Face AI models on Gcore edge inference nodes to easily set up, manage, and ensure real-time delivery of your AI-powered solutions.

Whether you're working on natural language processing, computer vision, or other AI tasks, using edge inference significantly improves response times and scalability of your workloads.

Step 1. Set up a Hugging Face Space

1. In the Hugging Face models catalog, select the model you want to deploy for your edge inference solution.

2. Navigate to the corresponding space where the model is used. For example, for the model Pixtral-12B-2409, select any of the associated spaces.

Spaces section in Hugging Face

3. Copy the Docker image link and startup command according to the instructions from the official Hugging Face guide.

Step 2. Deploy the Hugging Face model on edge inference

1. In the Gcore Customer Portal, click Inference at the Edge.

Overview page with options to create custom models or from catalog

2. Click Deploy custom model.

3. Configure the model image:

  • In the Model image URL (docker tag) field, enter the following link: registry.hf.space/ethux-mistral-pixtral-demo:latest.

  • Enable the Set startup command toggle and add the executable command python app.py.

Configurations for a public Hugging Face AI model

4. Set the Container port to 7860.

Container port settings for a public Hugging Face AI model

5. Configure the pod:

  • Processor type: GPU-optimized

  • Flavor: 1x L40S / 16 vCPU / 232GiB RAM

Pod settings for a public Hugging Face AI model

6. In the Routing placement field, choose any available region for optimal performance.

7. Enter a name for your deployment.

8. Click Deploy.

Step 3: Interact with the model

Once the model is up and running, you’ll get a link to the endpoint. You can interact with the model via this endpoint to test and use your deployed inference model at the edge.

Was this article helpful?

Not a Gcore user yet?

Discover our offerings, including virtual instances starting from 3.7 euro/mo, bare metal servers, AI Infrastructure, load balancers, Managed Kubernetes, Function as a Service, and Centralized Logging solutions.

Go to the product page