Check inference deployment quota

curl --request POST \ --url https://api.gcore.com/cloud/v3/inference/{project_id}/deployments/check_limits \ --header 'Authorization: <api-key>' \ --header 'Content-Type: application/json' \ --data ' { "containers": [ { "region_id": 1, "scale": { "max": 3, "min": 1 } } ], "flavor_name": "inference-16vcpu-232gib-1xh100-80gb" } '

{ "inference_cpu_millicore_count_limit": 8000, "inference_cpu_millicore_count_requested": 3000, "inference_cpu_millicore_count_usage": 2000, "inference_gpu_a100_count_limit": 4, "inference_gpu_a100_count_requested": 2, "inference_gpu_a100_count_usage": 1, "inference_gpu_h100_count_limit": 4, "inference_gpu_h100_count_requested": 2, "inference_gpu_h100_count_usage": 1, "inference_gpu_l40s_count_limit": 4, "inference_gpu_l40s_count_requested": 2, "inference_gpu_l40s_count_usage": 1, "inference_instance_count_limit": 10, "inference_instance_count_requested": 1, "inference_instance_count_usage": 1 }

Authorizations

Authorization

string

header

required

API key for authentication. Make sure to include the word apikey, followed by a single space and then your token. Example: apikey 1234$abcdef

Path Parameters

project_id

integer

required

Project ID

Example:

1

Body

application/json

containers

ContainerInSerializerV3 · object[]

required

List of containers for the inference instance.

Minimum array length: 1

Show child attributes

Example:

[
  {
    "region_id": 1,
    "scale": { "max": 3, "min": 1 }
  }
]

flavor_name

string

required

Inference flavor name.

Minimum string length: 1

Example:

"inference-16vcpu-232gib-1xh100-80gb"

Response

200 - application/json

inference_cpu_millicore_count_limit

integer

Inference CPU millicore count limit

Example:

8000

inference_cpu_millicore_count_requested

integer

Inference CPU millicore count requested

Example:

3000

inference_cpu_millicore_count_usage

integer

Inference CPU millicore count usage

Example:

2000

inference_gpu_a100_count_limit

integer

Inference GPU A100 Count limit

Example:

4

inference_gpu_a100_count_requested

integer

Inference GPU A100 Count requested

Example:

2

inference_gpu_a100_count_usage

integer

Inference GPU A100 Count usage

Example:

1

inference_gpu_h100_count_limit

integer

Inference GPU H100 Count limit

Example:

4

inference_gpu_h100_count_requested

integer

Inference GPU H100 Count requested

Example:

2

inference_gpu_h100_count_usage

integer

Inference GPU H100 Count usage

Example:

1

inference_gpu_l40s_count_limit

integer

Inference GPU L40s Count limit

Example:

4

inference_gpu_l40s_count_requested

integer

Inference GPU L40s Count requested

Example:

2

inference_gpu_l40s_count_usage

integer

Inference GPU L40s Count usage

Example:

1

inference_instance_count_limit

integer

Inference instance count limit

Example:

10

inference_instance_count_requested

integer

Inference instance count requested

Example:

1

inference_instance_count_usage

integer

Inference instance count usage

Example:

1

Overview

IAM

CDN

Managed DNS

Cloud

DDoS Protection

FastEdge

WAAP

Streaming

Object Storage

Resellers

Check inference deployment quota

Authorizations

Path Parameters

Body

Response

Overview

IAM

CDN

Managed DNS

Cloud

DDoS Protection

FastEdge

WAAP

Streaming

Object Storage

Resellers

Documentation Index

Authorizations

Path Parameters

Body

Response