How to Run Hugging Face Spaces on Gcore Inference at the Edge

How to Run Hugging Face Spaces on Gcore Inference at the Edge

Running machine learning models, especially large-scale models like GPT 3 or BERT, requires a lot of computing power and comes with a lot of latency. This makes real-time applications resource-intensive and challenging to deliver. Running ML models at the edge is a lightweight approach offering significant advantages for latency, privacy, and resource optimization.  

Gcore Inference at the Edge makes it simple to deploy and manage custom models efficiently, giving you the ability to deploy and scale your favorite Hugging Face models globally in just a few clicks. In this guide, we’ll walk you through how easy it is to harness the power of Gcore’s edge AI infrastructure to deploy a Hugging Face Space model. Whether you’re developing NLP solutions or cutting-edge computer vision applications, deploying at the edge has never been simpler—or more powerful. 

Step 1: Log In to the Gcore Customer Portal 

Go to gcore.com and log in to the Gcore Customer Portal. If you don’t yet have an account, go ahead and create one—it’s free. 

Step 2: Go to Inference at the Edge 

In the Gcore Customer Portal, click Inference at the Edge from the left navigation menu. Then click Deploy custom model

The Gcore Customer Portal enables simple AI model deployment

Step 3: Choose a Hugging Face Model 

Open huggingface.com and browse the available models. Select the model you want to deploy. Navigate to the corresponding Hugging Face Space for the model. 

Gcore allows you to run Hugging Face in a few simple steps

Click on Files in the Space and locate the Docker option. 

Gcore allows you to run Hugging Face in a few simple steps

Copy the Docker image link and startup command from Hugging Face Space. 

Gcore allows you to run Hugging Face in a few simple steps via Docker

Step 4: Deploy the Model on Gcore 

Return to the Gcore Customer Portal deployment page and enter the following details: 

  • Model image URL: registry.hf.space/ethux-mistral-pixtral-demo:latest 
  • Startup command: python app.py 
  • Container port: 7860 

Configure the pod as follows: 

  • GPU-optimized: 1x L40S 
  • vCPUs: 16 
  • RAM: 232GiB 

For optimal performance, choose any available region for routing placement. Name your deployment and click Deploy.

Deploy custom or pretrained models directly in the Gcore Customer Portal

Step 5: Interact with Your Model 

Once the model is up and running, you’ll be provided with an endpoint. You can now interact with the model via this endpoint to test and use your deployed model at the edge.

The Gcore Customer Portal provides an endpoint for AI model interaction
Gcore enables rapid, simple AI model deployment

Powerful, Simple AI Deployment with Gcore 

Gcore Inference at the Edge is the future of AI deployment, combining the ease of Hugging Face integration with the robust infrastructure needed for real-time, scalable, and global solutions. By leveraging edge computing, you can optimize model performance and simultaneously futureproof your business in a world that increasingly demands fast, secure, and localized AI applications. 

Deploying models to the edge allows you to capitalize on real-time insights, improve customer experiences, and outpace your competitors. Whether you’re leading a team of developers or spearheading a new AI initiative, Gcore Inference at the Edge offers the tools you need to innovate at the speed of tomorrow. 

Explore Gcore Inference at the Edge

How to Run Hugging Face Spaces on Gcore Inference at the Edge

Subscribe
to our newsletter

Get the latest industry trends, exclusive insights, and Gcore
updates delivered straight to your inbox.