Recently I was knee-deep in refactoring an agentic workflow that had become, frankly, a bit of a Frankenstein’s monster. Half the logic was on Chat Completions with manually managed message arrays, the other half was bolted onto the Assistants API with Threads and Runs. State management was scattered across three different services, and I’d lost track of which conversation context lived where. Sound familiar?
Then Microsoft dropped the Responses API in the March 2025 release, and I’ll be honest, my first reaction was “not another API surface.” But after spending a weekend migrating that messy codebase, I’m a convert. This is, without a doubt, the most architecturally significant API change Azure OpenAI has shipped to date.
This post will cover what the Responses API actually is, how it differs from Chat Completions and the Assistants API, what the new built-in tools look like in practice, and how to start migrating. Let’s dive in!
So What Actually Is the Responses API?
In plain English, the Responses API is a single, unified interface that takes the conversational simplicity of Chat Completions and combines it with the stateful, tool-rich capabilities of the Assistants API. You get server-side state management, built-in tools, and multi-turn conversation chaining, all without needing to wrangle Threads, Runs, or separate API surfaces.
The key idea is that every call returns a response object with a unique ID. You can chain conversations together by passing previous_response_id from one call to the next. The server handles the context window for you. No more manually appending message arrays and praying you haven’t blown your token limit.
Here’s what a basic call looks like:
// Install packages:
// dotnet add package OpenAI
// dotnet add package Azure.Identity
using OpenAI;
using OpenAI.Responses;
using System.ClientModel.Primitives;
using Azure.Identity;
#pragma warning disable OPENAI001
BearerTokenPolicy tokenPolicy = new(
new DefaultAzureCredential(),
"https://ai.azure.com/.default");
OpenAIResponseClient client = new(
model: "gpt-4.1-nano",
authenticationPolicy: tokenPolicy,
options: new OpenAIClientOptions()
{
Endpoint = new Uri("https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1")
});
OpenAIResponse response = await client.CreateResponseAsync("What's the capital of Australia?");
Console.WriteLine(response.GetOutputText());
If you’ve been using Chat Completions, you’ll notice a few things straight away. The endpoint is /openai/v1/responses (no more api-version query parameters on the v1 routes). The input is input rather than messages. And the output is a typed response object rather than a choices array. Pretty straight forward, right?
Chaining Conversations (Without the Headache)
The real magic is in multi-turn conversations. With Chat Completions, you had to manually track and resend the entire message history on every call. With the Responses API, you just pass the previous response ID:
// Initial request
OpenAIResponse response = await client.CreateResponseAsync("Explain quantum computing to me.");
// Follow up, the model has full context of the previous exchange
OpenAIResponse secondResponse = await client.CreateResponseAsync(
"Now explain that like I'm five.",
new ResponseCreationOptions()
{
PreviousResponseId = response.Id
});
Console.WriteLine(secondResponse.GetOutputText());
The server retains the conversation state for 30 days by default. No more client-side arrays growing unbounded. No more accidentally dropping system messages when you truncate for token limits. The server just handles it.
Built-in Tools: Web Search, Code Interpreter, MCP, and More
This is where it gets genuinely exciting. The Responses API ships with a suite of built-in tools that previously required either the Assistants API or custom middleware:
- Web Search via Bing grounding, so your model can fetch real-time information
- Code Interpreter for running Python in a sandboxed container
- File Search for document retrieval
- Image Generation with gpt-image-1
- Remote MCP servers, connecting to the broader Model Context Protocol ecosystem
Here’s web search in action:
ResponseCreationOptions options = new();
options.Tools.Add(ResponseTool.CreateWebSearchTool());
OpenAIResponse response = await client.CreateResponseAsync(
"What were the key announcements at Microsoft Build 2025?", options);
Console.WriteLine(response.GetOutputText());
That’s it. One parameter. The model decides when to search, executes the search, and synthesises the results into its response. No Bing API keys, no custom function calling plumbing, no stitching results together yourself.
And for MCP, you can point the model at any remote MCP server:
ResponseCreationOptions mcpOptions = new();
mcpOptions.Tools.Add(ResponseTool.CreateMcpTool(
serverLabel: "my_tools",
serverUri: new Uri("https://my-mcp-server.example.com"),
toolCallApprovalPolicy: new McpToolCallApprovalPolicy(
GlobalMcpToolCallApprovalPolicy.NeverRequireApproval)
));
OpenAIResponse response = await client.CreateResponseAsync(
"Search for information about Azure Functions", mcpOptions);
This is a big deal. It means your Azure OpenAI models can now natively consume tools from the entire MCP ecosystem, the same protocol used by Claude, GitHub Copilot, and a growing number of developer tools.
Context Compaction: Managing Long Conversations
One of my favourite features (and one that would have saved me a lot of pain in that Frankenstein codebase) is context compaction. When your conversation history grows large, the API can automatically summarise earlier turns to fit within the model’s context window.
You can enable server-side compaction like this:
ResponseCreationOptions compactionOptions = new()
{
Input = conversation,
};
compactionOptions.ContextManagement.Add(new ResponseContextManagement()
{
Type = "compaction",
CompactThreshold = 200000
});
OpenAIResponse response = await client.CreateResponseAsync(compactionOptions);
When the token count crosses your threshold, the service automatically compacts the context and emits an encrypted compaction item. You don’t need to do anything special; just keep appending output items to your input array as normal.
For those of you running long-lived agent loops or multi-step workflows, this is a game-changer. No more writing custom summarisation logic or awkwardly truncating message histories.
Background Tasks for Long-Running Workflows
Need to kick off a complex reasoning task that might take a few minutes? The Responses API supports background mode:
ResponseCreationOptions backgroundOptions = new()
{
AllowBackgroundExecution = true
};
OpenAIResponse response = await client.CreateResponseAsync(
"Analyse this codebase and suggest architectural improvements.",
backgroundOptions);
// Poll for completion
while (response.Status == "queued" || response.Status == "in_progress")
{
await Task.Delay(2000);
response = await client.GetResponseAsync(response.Id);
}
Console.WriteLine(response.GetOutputText());
This is particularly useful with reasoning models like o3 and o4-mini, where complex tasks can take longer than a typical HTTP timeout allows. You fire the request, go make a coffee, and poll for the result.
So What Happens to Chat Completions?
Good news: Chat Completions isn’t going anywhere. Microsoft has been clear that it remains a supported, stable API for straightforward text generation. But here’s the honest truth: every new feature is landing on the Responses API first. Built-in tools, MCP, compaction, background tasks, computer use… none of these are coming to Chat Completions.
In my opinion, if you’re starting a new project today, use the Responses API. If you have an existing Chat Completions integration that’s working well, there’s no rush to migrate. But if you’re building anything agentic, anything with tools, or anything that needs multi-turn state management, the Responses API is the clear choice.
Here’s a quick comparison to help you decide:
| Capability | Chat Completions | Responses API |
|---|---|---|
| Basic text generation | Yes | Yes |
| Multi-turn state management | Manual (client-side) | Automatic (server-side) |
| Built-in web search | No | Yes |
| Code interpreter | No (Assistants only) | Yes |
| Remote MCP servers | No | Yes |
| Context compaction | No | Yes |
| Background async tasks | No | Yes |
| Computer use | No | Yes |
| Function calling | Yes | Yes |
Getting Started
Migration is pretty straight forward. The key changes are:
- Endpoint: Move from
/chat/completions(withapi-versionparams) to/openai/v1/responses - Client: Use the
OpenAIResponseClientwith aBearerTokenPolicypointed at your Azure resource’s/openai/v1endpoint - Input: Replace
messagesarrays withinput(and optionallyinstructionsfor system-level guidance) - Output: Read from
response.GetOutputText()instead ofresponse.Choices[0].Message.Content - Multi-turn: Replace manual message history with
PreviousResponseId - Function calling: Schemas are now strict by default (which is honestly better practice anyway)
The official migration guide from OpenAI has the full details, and the Azure-specific how-to guide covers the Azure endpoints and authentication patterns.
Note: The Responses API is available in Australia East (among many other regions), so those of us down under get local latency. Always a welcome bonus.
Wrapping Up
The Responses API is the most significant API evolution Azure OpenAI has shipped. It simplifies state management, brings powerful built-in tools to every developer, and opens the door to the MCP ecosystem. If you’ve been juggling Chat Completions and the Assistants API like I was, the unification alone is worth the migration effort. And with features like context compaction and background tasks, it’s clearly built for the agentic workloads that are defining 2025.
As always, feel free to reach out with any questions or comments!
Until next time, stay cloudy!