Skip to main content

Documentation Index

Fetch the complete documentation index at: https://gcore.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

The Application Catalog provides pre-built open-source AI models that you can deploy without building or configuring a container image.
Request a quota increase if the account quota is insufficient for the selected flavor.

Create a deployment

Complete the following steps to select an application, configure compute resources, and deploy the model.

Step 1. Open the Create Deployment form

There are two ways to open the Create Deployment form:
  • Navigate to Everywhere Inference > Deployments and click Deploy application from catalog.
  • Open a model detail page in the Application Catalog and click Deploy Application. The form opens with the selected model pre-filled.
Deployments page showing the Deploy application from catalog and Deploy custom inference buttons

Step 2. Select an application

Under Deployment Configuration, click the Application dropdown and select the model to deploy.
Create Deployment form — Deployment Configuration and Routing placement sections

Step 3. Select regions

Under Routing placement, click Select region and select up to six regions where the model will run.

Step 4. Configure application modules

Under Application modules, configure the main module:
  • Expose — controls whether the module endpoint is publicly accessible. Enabled by default.
  • Flavor type — select CPU-optimized or GPU-optimized depending on the model requirements.
  • Flavor — select the hardware configuration from the dropdown.
  • Minimum pods — the minimum number of pods to keep running during low-traffic periods.
  • Maximum pods — the maximum number of pods Gcore provisions during peak traffic.
Under Parameters, adjust inference settings for the model:
  • Chunked prefill — splits long prompts into smaller chunks for processing. Default: true.
  • CPU Cache — caches KV tensors in CPU memory when GPU memory is full. Default: true.
  • Prefix caching — reuses cached KV state for repeated prompt prefixes. Default: true.
Application modules section showing Parameters and Open WebUI module
The main application module cannot be removed.
Some models include optional modules listed below the main module card. Open WebUI is a browser-based chat interface that lets users interact with the deployed model without writing code. To enable it, toggle it on. When enabled, Open WebUI runs as a separate module with its own compute settings:
  • Expose — controls whether the Open WebUI endpoint is publicly accessible.
  • Flavor type — select CPU-optimized or GPU-optimized.
  • Flavor — select the hardware configuration for the Open WebUI module.
  • Minimum pods and Maximum pods — set the pod scaling range for the module.
Open WebUI module enabled showing Expose, Flavor type, Flavor, and pod count settings
Each enabled module consumes one inference instance from the quota.

Step 5. Name the deployment

Under Deployment details, enter a name for the deployment. Use only letters and numbers — hyphens are not allowed in deployment names.

Step 6. Set additional options

(Optional) Enable the Enable API Key authentication toggle to restrict access using API keys.
Additional options section with Enable API Key authentication toggle enabled and API key selector

Step 7. Deploy

Review the estimated cost in the right panel, then click Deploy model. Gcore creates the deployment and opens the Deployments page, where the deployment status is visible.

Deployment status

The new deployment appears on the Deployments page with a Deploying status. Once all pods are running, the status changes to Active. The endpoint URL becomes available on the deployment detail page. Use it to send inference requests as described in Query a deployed model. Logs, compute settings, and other deployment options are available on the deployment detail page.