Running machine learning models, especially large-scale models like GPT 3 or BERT, requires a lot of computing power and comes with a lot of latency. This makes real-time applications resource-intensive and challenging to deliver. Running ML models at the edge is a lightweight approach offering significant advantages for latency, privacy, and resource optimization.
Gcore Inference at the Edge makes it simple to deploy and manage custom models efficiently, giving you the ability to deploy and scale your favorite Hugging Face models globally in just a few clicks. In this guide, we’ll walk you through how easy it is to harness the power of Gcore’s edge AI infrastructure to deploy a Hugging Face Space model. Whether you’re developing NLP solutions or cutting-edge computer vision applications, deploying at the edge has never been simpler—or more powerful.
Step 1: Log In to the Gcore Customer Portal
Go to gcore.com and log in to the Gcore Customer Portal. If you don’t yet have an account, go ahead and create one—it’s free.
Step 2: Go to Inference at the Edge
In the Gcore Customer Portal, click Inference at the Edge from the left navigation menu. Then click Deploy custom model.
Step 3: Choose a Hugging Face Model
Open huggingface.com and browse the available models. Select the model you want to deploy. Navigate to the corresponding Hugging Face Space for the model.
Click on Files in the Space and locate the Docker option.
Copy the Docker image link and startup command from Hugging Face Space.
Step 4: Deploy the Model on Gcore
Return to the Gcore Customer Portal deployment page and enter the following details:
- Model image URL: registry.hf.space/ethux-mistral-pixtral-demo:latest
- Startup command: python app.py
- Container port: 7860
Configure the pod as follows:
- GPU-optimized: 1x L40S
- vCPUs: 16
- RAM: 232GiB
For optimal performance, choose any available region for routing placement. Name your deployment and click Deploy.
Step 5: Interact with Your Model
Once the model is up and running, you’ll be provided with an endpoint. You can now interact with the model via this endpoint to test and use your deployed model at the edge.
Powerful, Simple AI Deployment with Gcore
Gcore Inference at the Edge is the future of AI deployment, combining the ease of Hugging Face integration with the robust infrastructure needed for real-time, scalable, and global solutions. By leveraging edge computing, you can optimize model performance and simultaneously futureproof your business in a world that increasingly demands fast, secure, and localized AI applications.
Deploying models to the edge allows you to capitalize on real-time insights, improve customer experiences, and outpace your competitors. Whether you’re leading a team of developers or spearheading a new AI initiative, Gcore Inference at the Edge offers the tools you need to innovate at the speed of tomorrow.