API
The Gcore Customer Portal is being updated. Screenshots may not show the current version.
Edge Cloud
Edge Cloud
OverviewBillingTerraform
API
Chosen image
Home/Edge Cloud

Deploy AI models in the Customer Portal

With Gcore Inference at the Edge, you can use foundational open-source models from our AI model catalog or deploy a custom model by uploading a Docker container image.

Step 1. Select a model

This step will slightly differ based on whether you choose to deploy a custom model or use a model from our catalog.

Deploy model from the catalog

1. In the Gcore Customer Portal, navigate to Cloud > Inference at the Edge.

2. Open the Overview page.

3. Click Browse model catalog to view the available AI models.

Overview tab with model catalog and custom model sections

4. Hover your cursor over the model you are interested in and click on the preferred model.

Model catalog tab

5. Ensure that the correct model is selected. If not—select another one from the dropdown menu.

A dropdown with model versions

Deploy a custom model

1. In the Gcore Customer Portal, navigate to Cloud > Inference at the Edge.

2. Open the Overview page.

3. Click Deploy custom model.

Overview tab with model catalog and custom model sections

4. In the Registry dropdown, select the storage location of your AI model.

Deploy a model dialog with highlighted registry section

If you need to add a new model registry, click Add registry and then configure it as follows:

  • Image registry name: Registry name that will be displayed in the Registry dropdown.

  • Image registry URL: Link to the location where your AI model is stored.

  • Image registry username: Username you use to access the storage location of your AI model.

  • Image registry password: Password you use to access the storage location of your AI model.

To save the new registry, click Add.

The image with your AI model must be built for the x86-64(AMD64) architecture.

5. Enter the name of the image with your model. For example: ghcr.io/namespace/image_name:tag or docker.io/username/model:tag.

6. Specify a port to which the containerized model will listen. The external port for accessing your deployment is always 443 (HTTPS).

Image name and port sections

Step 2. Select a flavor

This configuration determines the allocated number of resources (GPU/vCPU/RAM) for running your model. Ensure that you select sufficient resources. Otherwise, the model deployment might fail.

Flavor dropdown in the Pod configuration section

We recommended the following flavor parameters for models:

Recommended flavor Billion parameters
1 × L40s 48 GB 4.1–21
2 × L40s 48 GB 21.1–41
4 × L40s 48 GB 21.1–41

Step 3. Set up routing placement

Select the inference regions where the model will run from the list of available worldwide edge PoPs. The list of available PoPs depends on which pod configuration you selected in Step 2.

Regions dropdown in the Routing placement section

Step 4. Configure autoscaling

You can set up autoscaling for all pods (All selected regions) or only for pods located in particular regions (Custom).

Specify the range of nodes you want to maintain:

  • Minimum pods: The minimum number of pods that must be deployed during low-load periods.

  • Maximum pods: The maximum number of pods that can be added during peak-load periods.

Autoscaling section

To ensure more efficient use of computational resources and consistent model performance, define scaling thresholds for GPU and CPU utilization.

Click Advanced settings to view and modify current thresholds:

  • The minimum setting is 1% of the resource capacity.

  • The maximum setting is 100% of the resource capacity.

Autoscaling parameters

By default, the autoscaling parameters are set to 80% but you can enter any percentage within the specified range.

Step 5 (Optional). Add environment variables

If you want to add additional information to your model deployment, create variables for your container in the format of key-value pairs. These variables will only be used in the environment of the created container.

Environment variables section

Step 6 (Optional). Configure authentication via API keys

You can configure API authentication for your deployment. Turn on the "Enable API Key Authentication" toggle to access the authentication settings.

API keys section with enabled toggle

A single deployment can have multiple API keys, and the same API key can be attached to multiple deployments.

Choose one of the following options:

  • Select API keys: Add one or more keys that are already stored in the Gcore Customer Portal by selecting them from the dropdown list.

  • Create new API key: Generate a new key.

To generate a new key, select the Create new API key link and then perform the following steps:

1. In a new dialog that opens, enter the key name to identify the key in the system.

2. (Optional) Add a key description to give more context about the key and its usage.

3. As a security measure, you can specify the key expiration date. If you don’t want to regenerate the key and instead want to keep it indefinitely, choose Never expire.

4. Click Create to generate the key.

Create API key dialog with annotated steps

After you generate the key, it will appear in the API Keys dropdown. You can then select it to authenticate to the deployment.

Step 7. Specify pod lifetime

Specify the number of seconds after which a pod will be deleted when there are no requests to your pod. For example, if you enter 600, the pod will be deleted in 600 seconds, which is equal to ten minutes.

If you specify 0, the container will take approximately one minute to scale down.

Pod lifetime section

Step 8. Enter deployment details

Enter the deployment name and additional information if needed. This information will be displayed on the Deployments page under Settings.

Deployment details section

Step 9. Finalize deployment

Scroll to the top of the page and click Deploy in the top-right corner of the screen.

Your plan section with an active Deploy button

Was this article helpful?

Not a Gcore user yet?

Discover our offerings, including virtual instances starting from 3.7 euro/mo, bare metal servers, AI Infrastructure, load balancers, Managed Kubernetes, Function as a Service, and Centralized Logging solutions.

Go to the product page