Step 1. Initialize a quota request
To create a new quota request, click this direct link or take the following steps: Navigate to Everywhere Inference > Quotas in the Gcore Customer Portal. This will open the Account Quotas dialog, where you can view and modify your Everywhere Inference quotas.
Step 2. Update the account quotas
The Account Quotas dialog shows an overview of the currently configured quotas, which you can use to update for new requests.
Deployment example
Let’s look at the following settings for a model deployment:
- The 1xL40S / 16 vCPU / 232 GiB RAM flavor
- Two regions
- One pod
- 1 deployment
- 20000 millicores
- 2 GPU L40S
- 1 deployment
- 40000 millicores
- 4 GPU L40S
InfoIf you’re not sure about your requirements, request more than you think you need so you can change your autoscaling settings later.
Step 3. Send the quota increase request
The Request form is on the right. Fill it out with a description explaining why you need the increase, then click the Send request button.