Create inference deployment

Python

import os
from gcore import Gcore

client = Gcore(
    api_key=os.environ.get("GCORE_API_KEY"),  # This is the default and can be omitted
)
task_id_list = client.cloud.inference.deployments.create(
    project_id=1,
    containers=[{
        "region_id": 1,
        "scale": {
            "max": 3,
            "min": 1,
        },
    }],
    flavor_name="inference-16vcpu-232gib-1xh100-80gb",
    image="nginx:latest",
    listening_port=80,
    name="my-instance",
)
print(task_id_list.tasks)

{
  "tasks": [
    "<string>"
  ]
}

POST

cloud

inference

{project_id}

deployments

Python

import os
from gcore import Gcore

client = Gcore(
    api_key=os.environ.get("GCORE_API_KEY"),  # This is the default and can be omitted
)
task_id_list = client.cloud.inference.deployments.create(
    project_id=1,
    containers=[{
        "region_id": 1,
        "scale": {
            "max": 3,
            "min": 1,
        },
    }],
    flavor_name="inference-16vcpu-232gib-1xh100-80gb",
    image="nginx:latest",
    listening_port=80,
    name="my-instance",
)
print(task_id_list.tasks)

{
  "tasks": [
    "<string>"
  ]
}

Authorizations

Authorization

string

header

required

API key for authentication. Make sure to include the word apikey, followed by a single space and then your token. Example: apikey 1234$abcdef

Path Parameters

project_id

integer

required

Project ID

Example:

1

Body

application/json

containers

ContainerInSerializerV3 · object[]

required

List of containers for the inference instance.

Minimum array length: 1

Show child attributes

Example:

[
  {
    "region_id": 1,
    "scale": {
      "cooldown_period": 60,
      "max": 3,
      "min": 1,
      "triggers": {
        "cpu": { "threshold": 80 },
        "memory": { "threshold": 70 }
      }
    }
  }
]

flavor_name

string

required

Flavor name for the inference instance.

Minimum string length: 1

Example:

"inference-16vcpu-232gib-1xh100-80gb"

image

string

required

Docker image for the inference instance. This field should contain the image name and tag in the format 'name:tag', e.g., 'nginx:latest'. It defaults to Docker Hub as the image registry, but any accessible Docker image URL can be specified.

Pattern: ^(?:(?:[a-z0-9]+(?:[._-][a-z0-9]+)*/)*[a-z0-9]+(?:[._-][a-z0-9]+)*)(?::[A-Za-z0-9_][A-Za-z0-9_.-]{0,127})?$

Example:

"nginx:latest"

listening_port

integer

required

Listening port for the inference instance.

Required range: 1 <= x <= 65535

Example:

80

name

string

required

Inference instance name.

Required string length: 4 - 30

Pattern: ^[a-z0-9]([-a-z0-9]*[a-z0-9])?$

Example:

"my-instance"

api_keys

string[]

List of API keys for the inference instance. Multiple keys can be attached to one deployment.If auth_enabled and api_keys are both specified, a ValidationError will be raised.

Example:

["key1", "key2"]

auth_enabled

boolean

default:false

deprecated

Set to true to enable API key authentication for the inference instance. "Authorization": "Bearer *****" or "X-Api-Key": "*****" header is required for the requests to the instance if enabled. This field is deprecated and will be removed in the future. Use api_keys field instead.If auth_enabled and api_keys are both specified, a ValidationError will be raised.

Example:

false

command

string[] | null

Command to be executed when running a container from an image.

Example:

["nginx", "-g", "daemon off;"]

credentials_name

string | null

default:""

Registry credentials name

Example:

"dockerhub"

description

string | null

default:""

Inference instance description.

Example:

"My first instance"

envs

Envs · object

Environment variables for the inference instance.

Show child attributes

Example:

{ "DEBUG_MODE": "False", "KEY": "12345" }

ingress_opts

IngressOptsSerializer · object

Ingress options for the inference instance

Show child attributes

Example:

{ "disable_response_buffering": true }

logging

LoggingInSerializer · object

Logging configuration for the inference instance

Show child attributes

Example:

{
  "destination_region_id": 1,
  "enabled": true,
  "retention_policy": { "period": 42 },
  "topic_name": "my-log-name"
}

probes

InferenceInstanceProbesSerializerV2 · object

Probes configured for all containers of the inference instance. If probes are not provided, and the image_name is from a the Model Catalog registry, the default probes will be used.

Show child attributes

timeout

integer | null

default:120

Specifies the duration in seconds without any requests after which the containers will be downscaled to their minimum scale value as defined by scale.min. If set, this helps in optimizing resource usage by reducing the number of container instances during periods of inactivity. The default value when the parameter is not set is 120.

Required range: x >= 0

Example:

120

Response

200 - application/json

tasks

string[]

required

List of task IDs representing asynchronous operations. Use these IDs to monitor operation progress:

GET /v1/tasks/{task_id} - Check individual task status and details Poll task status until completion (FINISHED/ERROR) before proceeding with dependent operations.

Example:

["d478ae29-dedc-4869-82f0-96104425f565"]

List inference deployments Get inference deployment

⌘I

Overview

IAM

CDN

Managed DNS

Cloud

DDoS Protection

FastEdge

WAAP

Streaming

Object Storage

Resellers

Create inference deployment

Authorizations

Path Parameters

Body

Response