To ensure that only authenticated clients can access your AI models, you must deploy an inference instance with authorization enabled.
When deploying an AI model, set the auth_enabled
option to true
. This means an API Key will be automatically generated and linked to the deployment.
Once the deployment is created with authentication enabled, you can retrieve the API Key via the designated API endpoint.
The API key can be retrieved via this endpoint.
Once you have retrieved the API Key, include it in your API requests using the X-API-Key
header.
Here’s an example demonstrating how to use the API key for authorization:
from openai import OpenAI
def get_llm_response(message: str) -> str:
client = OpenAI(api_key=LLM_KEY, base_url=f"{LLM_API}/v1")
response = client.chat.completions.create(
model="meta-llama/Llama-3.3-70B-Instruct",
messages=[
{"role": "user", "content": message},
],
extra_headers={"X-API-Key": LLM_KEY},
)
return response.choices[0].message.content
if __name__ == "__main__":
print(get_llm_response("Why is the sky blue?"))
To learn more about deploying AI models, refer to our dedicated guide.
Was this article helpful?