Authorizations
API key for authentication. Make sure to include the word apikey
, followed by a single space and then your token.
Example: apikey 1234$abcdef
Path Parameters
Project ID
1
Body
List of containers for the inference instance.
1
[
{
"region_id": 1,
"scale": {
"cooldown_period": 60,
"max": 3,
"min": 1,
"triggers": {
"cpu": { "threshold": 80 },
"memory": { "threshold": 70 }
}
}
}
]
Flavor name for the inference instance.
1
"inference-16vcpu-232gib-1xh100-80gb"
Docker image for the inference instance. This field should contain the image name and tag in the format 'name:tag', e.g., 'nginx:latest'. It defaults to Docker Hub as the image registry, but any accessible Docker image URL can be specified.
"nginx:latest"
Listening port for the inference instance.
1 <= x <= 65535
80
Inference instance name.
4 - 30
"my-instance"
List of API keys for the inference instance. Multiple keys can be attached to one deployment.If auth_enabled
and api_keys
are both specified, a ValidationError will be raised.
["key1", "key2"]
Set to true
to enable API key authentication for the inference instance. "Authorization": "Bearer ****\*"
or "X-Api-Key": "****\*"
header is required for the requests to the instance if enabled. This field is deprecated and will be removed in the future. Use api_keys
field instead.If auth_enabled
and api_keys
are both specified, a ValidationError will be raised.
false
Command to be executed when running a container from an image.
["nginx", "-g", "daemon off;"]
Registry credentials name
"dockerhub"
Inference instance description.
"My first instance"
Environment variables for the inference instance.
{ "DEBUG_MODE": "False", "KEY": "12345" }
Ingress options for the inference instance
{ "disable_response_buffering": true }
Logging configuration for the inference instance
{
"destination_region_id": 1,
"enabled": true,
"retention_policy": { "period": 42 },
"topic_name": "my-log-name"
}
{ "enabled": false }
Probes configured for all containers of the inference instance. If probes are not provided, and the image_name
is from a the Model Catalog registry, the default probes will be used.
Specifies the duration in seconds without any requests after which the containers will be downscaled to their minimum scale value as defined by scale.min
. If set, this helps in optimizing resource usage by reducing the number of container instances during periods of inactivity. The default value when the parameter is not set is 120.
x >= 0
120
Response
OK
List of task IDs representing asynchronous operations. Use these IDs to monitor operation progress:
* GET /v1/tasks/{
task_id}
- Check individual task status and details
Poll task status until completion (FINISHED
/ERROR
) before proceeding with dependent operations.
["d478ae29-dedc-4869-82f0-96104425f565"]