With Gcore Inference at the Edge, you can use foundational open-source models from our AI model catalog or deploy a custom model by uploading a Docker container image.
This step will slightly differ based on whether you choose to deploy a custom model or use a model from our catalog.
1. In the Gcore Customer Portal, navigate to Cloud > Inference at the Edge.
2. Open the Overview page.
3. Click Browse model catalog to view the available AI models.
4. Hover your cursor over the model you are interested in and click on the preferred model.
5. Ensure that the correct model is selected. If not—select another one from the dropdown menu.
1. In the Gcore Customer Portal, navigate to Cloud > Inference at the Edge.
2. Open the Overview page.
3. Click Deploy custom model.
4. In the Registry dropdown, select the storage location of your AI model.
If you need to add a new model registry, click Add registry and then configure it as follows:
Image registry name: Registry name that will be displayed in the Registry dropdown.
Image registry URL: Link to the location where your AI model is stored.
Image registry username: Username you use to access the storage location of your AI model.
Image registry password: Password you use to access the storage location of your AI model.
To save the new registry, click Add.
The image with your AI model must be built for the x86-64(AMD64) architecture.
5. Enter the name of the image with your model. For example: ghcr.io/namespace/image_name:tag
or docker.io/username/model:tag
.
6. Specify a port to which the containerized model will listen. The external port for accessing your deployment is always 443 (HTTPS).
This configuration determines the allocated number of resources (GPU/vCPU/RAM) for running your model. Ensure that you select sufficient resources. Otherwise, the model deployment might fail.
We recommended the following flavor parameters for models:
Recommended flavor | Billion parameters |
---|---|
1 × L40s 48 GB | 4.1–21 |
2 × L40s 48 GB | 21.1–41 |
4 × L40s 48 GB | 21.1–41 |
Select the inference regions where the model will run from the list of available worldwide edge PoPs. The list of available PoPs depends on which pod configuration you selected in Step 2.
You can set up autoscaling for all pods (All selected regions) or only for pods located in particular regions (Custom).
Specify the range of nodes you want to maintain:
Minimum pods: The minimum number of pods that must be deployed during low-load periods.
Maximum pods: The maximum number of pods that can be added during peak-load periods.
To ensure more efficient use of computational resources and consistent model performance, define scaling thresholds for GPU and CPU utilization.
Click Advanced settings to view and modify current thresholds:
The minimum setting is 1% of the resource capacity.
The maximum setting is 100% of the resource capacity.
By default, the autoscaling parameters are set to 80% but you can enter any percentage within the specified range.
If you want to add additional information to your model deployment, create variables for your container in the format of key-value pairs. These variables will only be used in the environment of the created container.
You can configure API authentication for your deployment. Turn on the "Enable API Key Authentication" toggle to access the authentication settings.
A single deployment can have multiple API keys, and the same API key can be attached to multiple deployments.
Choose one of the following options:
Select API keys: Add one or more keys that are already stored in the Gcore Customer Portal by selecting them from the dropdown list.
Create new API key: Generate a new key.
To generate a new key, select the Create new API key link and then perform the following steps:
1. In a new dialog that opens, enter the key name to identify the key in the system.
2. (Optional) Add a key description to give more context about the key and its usage.
3. As a security measure, you can specify the key expiration date. If you don’t want to regenerate the key and instead want to keep it indefinitely, choose Never expire.
4. Click Create to generate the key.
After you generate the key, it will appear in the API Keys dropdown. You can then select it to authenticate to the deployment.
Specify the number of seconds after which a pod will be deleted when there are no requests to your pod. For example, if you enter 600, the pod will be deleted in 600 seconds, which is equal to ten minutes.
If you specify 0, the container will take approximately one minute to scale down.
Enter the deployment name and additional information if needed. This information will be displayed on the Deployments page under Settings.
Scroll to the top of the page and click Deploy in the top-right corner of the screen.
Was this article helpful?
Discover our offerings, including virtual instances starting from 3.7 euro/mo, bare metal servers, AI Infrastructure, load balancers, Managed Kubernetes, Function as a Service, and Centralized Logging solutions.