Home
Developers
How to deploy DeepSeek 70B with Ollama and a Web UI on Gcore Everywhere Inference

How to deploy DeepSeek 70B with Ollama and a Web UI on Gcore Everywhere Inference

By Gcore

January 28, 2025

2 min read

How to deploy DeepSeek 70B with Ollama and a Web UI on Gcore Everywhere Inference

Large language models (LLMs) like DeepSeek 70B are revolutionizing industries by enabling more advanced and dynamic conversational AI solutions. Whether you’re looking to build intelligent customer support systems, enhance content generation, or create data-driven applications, deploying and interacting with LLMs has never been more accessible.

In this tutorial, we’ll show you exactly how to set up DeepSeek 70B using Ollama and a Web UI on Gcore Everywhere Inference. By the end, you’ll have a fully functional environment where you can easily interact with your custom LLM via a user-friendly interface. This process involves three simple steps: deploying Ollama, deploying the web UI, and configuring the web UI and connecting to Ollama.

Let’s get started!

Step 1: Deploy Ollama

In the model image field, enter ollama/ollama.
Set the Port to 11434.

Under Pod Configuration, configure the following:
Select GPU-Optimized.
Choose a GPU type, such as 1×A100 or 1×H100.
Choose a region (e.g., Luxembourg-3).

Set an autoscaling policy or use the default settings.
Name your deployment (e.g., ollama).
Click Deploy model on the right side of the screen.

Once deployed, you’ll have an Ollama endpoint ready to serve your model.

Step 2: Deploy the Web UI for Ollama

Go back to the Gcore Everywhere Inference console and select Deploy Custom Model again.
In the Model Image field, enter ghcr.io/open-webui/open-webui:main.
Set the Port to 8080.

Under Pod Configuration, set:
- CPU-Optimized.
- Choose 4 vCPU / 16 GiB RAM.
Select the same region as before (e.g., Luxembourg-3).

Configure an autoscaling policy or use the default settings.
Name your deployment (e.g., webui).
Click Deploy model on the right side of the screen.

Once deployed, navigate to the Web UI endpoint from the Gcore Customer Portal.

Step 3: Configure the Web UI

From the Web UI endpoint and set up a username and password when prompted.

Go to Settings → Connections → Disable the OpenAI API integration.
In the Ollama API field, enter the endpoint for your Ollama deployment. You can find this in the Gcore Customer Portal. It will look similar to this: https://<your-ollama-deployment>.ai.gcore.dev/.

Click Save to confirm your changes.

Step 4: Pull and Use DeepSeek 70B

Open the chat section in the Web UI.
In the Select a model field, type deepseek-r1:70b.
Click Pull to download the model.

Wait for the download to complete.
Once downloaded, select the model and start chatting!

Your AI environment is ready to explore

By following these steps, you’ve successfully deployed DeepSeek 70B on Gcore Everywhere Inference with Ollama. This setup provides a powerful and user-friendly environment for experimenting with LLMs, prototyping AI-driven features, or integrating advanced conversational AI into your applications.

Ready to unlock the full potential of AI? Gcore Everywhere Inference offers outstanding scalability, performance, and support, making it the perfect solution for developers and businesses working with advanced AI models. Dive deeper into our powerful tools and resources by exploring our AI blog and docs.

Explore Gcore’s advanced inference solutions.

Query your cloud with natural language: A developer’s guide to Gcore MCP

What if you could ask your infrastructure questions and get real answers?With Gcore’s open-source implementation of the Model Context Protocol (MCP), now you can. MCP turns generative AI into an agent that understands your infrastructure, r

3 underestimated security risks of AI workloads and how to overcome them

Artificial intelligence workloads introduce a fundamentally different security landscape for engineering and security teams. Unlike traditional applications, AI systems must protect not just endpoints and networks, but also training data pi

Securing AI from the ground up: defense across the lifecycle

As more AI workloads shift to the edge for lower latency and localized processing, the attack surface expands. Defending a data center is old news. Now, you’re securing distributed training pipelines, mobile inference APIs, and storage envi

How AI is reshaping the future of interactive streaming

Interactive streaming is entering a new era. Artificial intelligence is changing how live content is created, delivered, and experienced. Advances in real-time avatars, voice synthesis, deepfake rendering, and ultra-low-latency delivery are

What are virtual machines?

An online virtual machine (VM), also called a virtual instance, is a software-based version of a physical computer. Instead of running directly on hardware, a VM operates inside a program that emulates a complete computer system, including

What is AI inference and how does it work?

Artificial intelligence (AI) inference is what happens when a trained AI model is used to predict outcomes from new, unseen data. While training focuses on learning from historical datasets, inference is about putting that learned knowledge