Gaming industry under DDoS attack. Get DDoS protection now. Start onboarding
  1. Home
  2. Developers
  3. How to Run Hugging Face Spaces on Gcore Inference at the Edge

How to Run Hugging Face Spaces on Gcore Inference at the Edge

  • By Gcore
  • October 17, 2024
  • 2 min read
How to Run Hugging Face Spaces on Gcore Inference at the Edge

Running machine learning models, especially large-scale models like GPT 3 or BERT, requires a lot of computing power and comes with a lot of latency. This makes real-time applications resource-intensive and challenging to deliver. Running ML models at the edge is a lightweight approach offering significant advantages for latency, privacy, and resource optimization.  

Gcore Inference at the Edge makes it simple to deploy and manage custom models efficiently, giving you the ability to deploy and scale your favorite Hugging Face models globally in just a few clicks. In this guide, we’ll walk you through how easy it is to harness the power of Gcore’s edge AI infrastructure to deploy a Hugging Face Space model. Whether you’re developing NLP solutions or cutting-edge computer vision applications, deploying at the edge has never been simpler—or more powerful. 

Step 1: Log In to the Gcore Customer Portal

Go to gcore.com and log in to the Gcore Customer Portal. If you don’t yet have an account, go ahead and create one—it’s free. 

Step 2: Go to Inference at the Edge

In the Gcore Customer Portal, click Inference at the Edge from the left navigation menu. Then click Deploy custom model

Step 3: Choose a Hugging Face Model

Open huggingface.com and browse the available models. Select the model you want to deploy. Navigate to the corresponding Hugging Face Space for the model. 

Click on Files in the Space and locate the Docker option. 

Copy the Docker image link and startup command from Hugging Face Space. 

Step 4: Deploy the Model on Gcore

Return to the Gcore Customer Portal deployment page and enter the following details: 

  • Model image URL: registry.hf.space/ethux-mistral-pixtral-demo:latest
  • Startup command: python app.py
  • Container port: 7860 

Configure the pod as follows: 

  • GPU-optimized: 1x L40S
  • vCPUs: 16
  • RAM: 232GiB 

For optimal performance, choose any available region for routing placement. Name your deployment and click Deploy.

Step 5: Interact with Your Model

Once the model is up and running, you’ll be provided with an endpoint. You can now interact with the model via this endpoint to test and use your deployed model at the edge.

Powerful, Simple AI Deployment with Gcore

Gcore Inference at the Edge is the future of AI deployment, combining the ease of Hugging Face integration with the robust infrastructure needed for real-time, scalable, and global solutions. By leveraging edge computing, you can optimize model performance and simultaneously futureproof your business in a world that increasingly demands fast, secure, and localized AI applications. 

Deploying models to the edge allows you to capitalize on real-time insights, improve customer experiences, and outpace your competitors. Whether you’re leading a team of developers or spearheading a new AI initiative, Gcore Inference at the Edge offers the tools you need to innovate at the speed of tomorrow. 

Explore Gcore Inference at the Edge

Related articles

Query your cloud with natural language: A developer’s guide to Gcore MCP

What if you could ask your infrastructure questions and get real answers?With Gcore’s open-source implementation of the Model Context Protocol (MCP), now you can. MCP turns generative AI into an agent that understands your infrastructure, r

3 underestimated security risks of AI workloads and how to overcome them

3 underestimated security risks of AI workloads and how to overcome them

Artificial intelligence workloads introduce a fundamentally different security landscape for engineering and security teams. Unlike traditional applications, AI systems must protect not just endpoints and networks, but also training data pi

Securing AI from the ground up: defense across the lifecycle

As more AI workloads shift to the edge for lower latency and localized processing, the attack surface expands. Defending a data center is old news. Now, you’re securing distributed training pipelines, mobile inference APIs, and storage envi

How AI is reshaping the future of interactive streaming

Interactive streaming is entering a new era. Artificial intelligence is changing how live content is created, delivered, and experienced. Advances in real-time avatars, voice synthesis, deepfake rendering, and ultra-low-latency delivery are

What are virtual machines?

An online virtual machine (VM), also called a virtual instance, is a software-based version of a physical computer. Instead of running directly on hardware, a VM operates inside a program that emulates a complete computer system, including

How to deploy DeepSeek 70B with Ollama and a Web UI on Gcore Everywhere Inference

Large language models (LLMs) like DeepSeek 70B are revolutionizing industries by enabling more advanced and dynamic conversational AI solutions. Whether you’re looking to build intelligent customer support systems, enhance content generatio

Subscribe to our newsletter

Get the latest industry trends, exclusive insights, and Gcore updates delivered straight to your inbox.