A client recently asked me to set up a GPT-4.1 deployment in Azure using Terraform—only to discover that the official documentation barely mentions the OpenAI endpoint or how to call it via the Responses API.
In fact, most examples focus on Chat Completions, which left me scrambling to figure out how to wire up /openai/v1/responses
against a custom deployment. I’ll walk through exactly what I did—including the Terraform code, a minimal Python test, and tips on region checks and deployment naming.
Prerequisites
- Terraform ≥ 1.5.0 installed and configured.
- An existing Azure Key Vault (we’ll store the OpenAI key there).
- An Azure Resource Group where you have write permissions.
- The Terraform AzureRM provider configured (service principal or managed identity).
- Python ≥ 3.8 and the
openai
package (pip install openai
).
Checking Your Region
Before you start, confirm that your target region supports both the Azure AI Services (OpenAI) resource and the desired model for the responses api (GPT-4.1). When I tried to spin this up, I discovered that not every region offers the “DataZoneStandard” SKU or that specific model version. https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/responses?tabs=python-secure
Terraform Configuration
Below is the Terraform code I used to:
- Create an Azure AI Services account
- Deploy GPT-4.1 as a cognitive deployment (named
gpt41
) - Store the AI Services primary key in Key Vault
#-----------------------------------------
# 1. AI Services Account
#-----------------------------------------
resource "azurerm_ai_services" "ai_services" {
name = "ai-services"
location = "swedencentral"
resource_group_name = azurerm_resource_group.rg.name
sku_name = "DataZoneStandard"
custom_subdomain_name = "demo-ai-custom-subdomain"
lifecycle {
prevent_destroy = true
}
}
#-----------------------------------------
# 2. Cognitive Deployment for GPT-4.1
#-----------------------------------------
resource "azurerm_cognitive_deployment" "gpt41" {
name = "gpt41"
cognitive_account_id = azurerm_ai_services.ai_services.id
model {
format = "OpenAI"
name = "gpt-4.1"
version = "2025-04-14"
}
sku {
name = "DataZoneStandard"
capacity = 300
}
version_upgrade_option = "OnceCurrentVersionExpired"
lifecycle {
prevent_destroy = true
}
}
#-----------------------------------------
# 3. Store Primary Access Key in Key Vault
#-----------------------------------------
resource "azurerm_key_vault_secret" "ai_key" {
name = "AiServices--Key"
value = azurerm_ai_services.ai_services.primary_access_key
key_vault_id = azurerm_key_vault.kv.id
}
3.1. azurerm_ai_services
location
: Must match a region that supports your desired model.sku_name
: I choseDataZoneStandard
, but there are different choices as per your own needs : https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models?tabs=global-standard%2Cstandard-chat-completions#model-summary-table-and-region-availabilitycustom_subdomain_name
: Defines the subdomain (https://demo-ai-custom-subdomain.openai.azure.com
), which you’ll need to call/openai/v1/responses
later. If you omit this, Azure generates a random subdomain that’s harder to remember.
3.2. azurerm_cognitive_deployment
name = "gpt41"
: This is your deployment name. It must match exactly in your client code when you calldeployment_name=...
.model { format, name, version }
:format = "OpenAI"
tells Azure we’re using an OpenAI‐compatible model.name = "gpt-4.1"
,version = "2025-04-14"
. Pin to a specific version so you don’t get unintended updates.
sku { name, capacity }
:name = "DataZoneStandard"
.capacity =
300. Adjust based on your performance/throughput needs.
Deploying with Terraform
- Initialize Terraform
terraform init
- Review the plan
terraform plan
- Apply the plan
terraform apply
- Verify in the Portal
- Go to Resource Groups → your-rg → ai-services.
- Under Deployments, confirm
gpt41
appears. - Under Keys and Endpoint, copy the Primary Access Key and note the endpoint URL (e.g.
https://demo-ai-custom-subdomain.openai.azure.com
).
Minimal Test Script (Python)
With Terraform done, let’s verify via the Responses API. Azure’s docs often mention Chat Completions, but you can hit /openai/v1/responses
directly. Create a simple Python script (test_gpt.py
):
import os
from openai import OpenAI
# Retrieve the key from an environment variable
subscription_key = os.getenv("AZURE_OPENAI_API_KEY")
client = OpenAI(
api_key = subscription_key,
base_url = "https://demo-ai-custom-subdomain.openai.azure.com/openai/v1/",
default_query = {"api-version": "preview"},
)
response = client.responses.create(
deployment_name = "gpt41", # Must match Terraform’s name
prompts = [
{"role": "user", "content": "This is a test."}
]
)
print(response.model_dump_json(indent=2))
- Set the environment variable to the Key Vault secret you stored (fetch it from Key Vault or hardcode for a quick test).
export AZURE_OPENAI_API_KEY="<primary-key-from-key-vault>"
- Run the script
python3 test_gpt.py
- Expected output: A JSON object with the GPT-4.1 response. If you see any
"error":"invalid_client"
, double-check your key/endpoint/deployment_name.
Why the Custom Name Matters
- Azure’s documentation mostly shows
/openai/deployments/{name}/chat/completions
. They rarely mention/openai/v1/responses
, which is why I hit several dead ends. - The
custom_subdomain_name
ensures your endpoint is predictable:https://demo-ai-custom-subdomain.openai.azure.com/openai/v1/
. If you don’t set it, Azure generates a GUID‐like subdomain, and you constantly have to copy/paste from the Portal. - The deployment name (
gpt41
) is required inclient.responses.create(deployment_name=…)
. If you calldeployment_name="gpt-4.1"
, it won’t work—Azure expects exactly the resource name you created in Terraform.
Whenever you see “model=…” in Azure samples, that actually maps to your custom deployment. Always double‐check that your deployment_name
in client code matches exactly what you created.
Conclusion
I hope this guide saves you time—when I first tackled this, Azure’s docs were sparse on the Responses API and custom deployment naming, forcing me into a lot of trial and error. With Terraform you can:
- Provision the AI Services resource in a supported region
- Deploy GPT-4.1 under a custom name
- Store the access key securely in Key Vault
- Hit
/openai/v1/responses
with a minimal Python script
Huge thanks to Marc Rufer and Cédric Mendelin for reviewing the Terraform code and logic. Once you’ve verified your region, subdomain, and deployment name, you’ll have a rock‐solid GPT-4.1 endpoint up and running in minutes.