James Westall

Building Your First Agent on Microsoft Foundry: From Zero to Production

Every second customer conversation I’ve had this year has included the same question: “We’ve been using Azure OpenAI for chat completions. How hard is it to build an actual agent?” Six months ago, my honest answer would have been “doable, but bring duct tape.” Not anymore. What used to require stitching together the Assistants API, custom orchestration logic, and a prayer to the demo gods is now a managed, production-supported platform with proper tooling and observability.

The Foundry Agent Service graduated to General Availability at Build 2025, and it’s the real deal. In this post, I’ll walk through building an agent from scratch using the .NET SDK, adding tools, testing it, enabling tracing with Application Insights, and getting it production-ready. If you’ve been curious about agents but haven’t built one yet, this is your starting point.

What Even Is an Agent?

Before we dive in, let’s clear up what “agent” actually means in this context, because the term gets thrown around a lot.

An agent is an AI application that uses a large language model to reason about user requests and take autonomous actions to fulfil them. Unlike a basic chatbot that just generates text, an agent can call tools, access external data, and make decisions across multiple steps to complete a task. Every agent has three core components:

Model (LLM): Provides the reasoning and language capabilities
Instructions: Define the agent’s goals, constraints, and behaviour
Tools: Give the agent access to data or actions (search, code execution, API calls)

The Foundry Agent Service handles the hosting, scaling, identity, observability, and enterprise security. You focus on the agent logic.

Prerequisites

Before we start, you’ll need:

An Azure subscription with a Microsoft Foundry resource and project
A model deployed in your project (I’m using gpt-4.1 for this walkthrough)
.NET 8.0+ with the Azure.AI.Projects, Azure.AI.Extensions.OpenAI, and Azure.Identity packages installed
An Application Insights resource (for tracing, which we’ll set up later)

dotnet add package Azure.AI.Projects
dotnet add package Azure.AI.Extensions.OpenAI
dotnet add package Azure.Identity

Step 1: Create Your First Agent

First cab off the rank, let’s create a basic agent. The new SDK uses a PromptAgentDefinition to define your agent’s model, instructions, and tools:

// Install packages:
// dotnet add package Azure.AI.Projects
// dotnet add package Azure.AI.Extensions.OpenAI
// dotnet add package Azure.Identity

using Azure.Identity;
using Azure.AI.Projects;
using Azure.AI.Extensions.OpenAI;

string projectEndpoint = "https://your-resource.ai.azure.com/api/projects/your-project";

// Create the project client
AIProjectClient projectClient = new(
    endpoint: new Uri(projectEndpoint),
    tokenProvider: new DefaultAzureCredential());

// Create a simple agent
AgentDefinition agentDefinition = new PromptAgentDefinition("gpt-4.1")
{
    Instructions = "You are a helpful IT helpdesk agent. " +
                   "Answer technical questions clearly and concisely. " +
                   "If you don't know the answer, say so honestly.",
};

AgentVersion agent = await projectClient.Agents.CreateAgentVersionAsync(
    "helpdesk-agent",
    options: new(agentDefinition));

Console.WriteLine($"Agent created: {agent.Name} (version: {agent.Version})");

That’s it for a basic agent. No infrastructure to provision, no containers to deploy, no orchestration framework to configure. The Agent Service handles all of that.

Step 2: Add Some Tools

A basic agent is nice, but an agent with tools is where it gets interesting. Let’s give our helpdesk agent web search capability and a code interpreter:

var agentDefinition = new PromptAgentDefinition("gpt-4.1")
{
    Instructions = "You are a helpful IT helpdesk agent. " +
                   "Use web search to find current documentation and solutions. " +
                   "Use the code interpreter to run diagnostic scripts when needed. " +
                   "Always cite your sources.",
    Tools = {
        new WebSearchToolDefinition(),
        new CodeInterpreterToolDefinition(),
    }
};

AgentVersion agent = await projectClient.Agents.CreateAgentVersionAsync(
    "helpdesk-agent-v2",
    options: new(agentDefinition));

The tool catalog in Foundry has over 1,400 tools available, including:

Web Search for real-time information via Bing
Code Interpreter for running Python in a sandboxed container
Azure AI Search for querying your enterprise search indexes
File Search for document retrieval
MCP servers for connecting to the Model Context Protocol ecosystem
Azure Functions for custom serverless tools
Microsoft Fabric for enterprise data queries

You can also define custom function tools for anything specific to your domain.

Step 3: Talk to Your Agent

Now let’s have a conversation with our agent. The Agent Service uses the Responses API under the hood, so you interact with agents through the ProjectResponsesClient:

// Get a responses client scoped to the agent
ProjectResponsesClient responsesClient = projectClient.ProjectOpenAIClient
    .GetProjectResponsesClientForAgent(agent.Name);

// Send a message to the agent
ResponseResult response = await responsesClient.CreateResponseAsync(
    new CreateResponseOptions
    {
        InputItems = { ResponseItem.CreateUserMessageItem("How do I reset a user's MFA in Microsoft Entra ID?") }
    });

Console.WriteLine(response.GetOutputText());

The GetProjectResponsesClientForAgent method returns a client that automatically routes requests through your agent, including its instructions and tools. The agent will use web search if it needs current information and code interpreter if the task requires computation.

For streaming responses (which is what you’d want in a real UI):

await foreach (StreamingResponseUpdate update in responsesClient.CreateResponseStreamingAsync(
    new CreateResponseOptions
    {
        InputItems = { ResponseItem.CreateUserMessageItem(
            "What's the PowerShell command to check Azure AD sync status?") }
    }))
{
    if (update is ResponseTextDeltaUpdate textDelta)
    {
        Console.Write(textDelta.Delta);
    }
    else if (update is ResponseItemDoneUpdate itemDone)
    {
        if (itemDone.Item is MessageResponseItem message)
        {
            foreach (var annotation in message.Content.Last().Annotations)
            {
                if (annotation is UrlCitationAnnotation citation)
                {
                    Console.WriteLine($"\nSource: {citation.Url}");
                }
            }
        }
    }
}

Step 4: Enable Tracing with Application Insights

Here’s something I wish I’d had in every agent demo I’ve put together: built-in tracing. The Agent Service automatically captures traces for every model call, tool invocation, and decision your agent makes. You just need to connect Application Insights.

In your Foundry project settings, link an Application Insights resource. Once connected, every agent interaction is traced using OpenTelemetry semantic conventions and surfaces in the Foundry portal’s Observability section.

What you can see in the traces:

The full conversation thread with user inputs and agent outputs
Every tool call the agent made, including inputs and outputs
Model reasoning steps and token usage
Latency breakdown across each step
Error details when things go wrong

This is genuinely invaluable for debugging. When someone says “the agent gave me a wrong answer,” you can trace back through the exact sequence of tool calls and model reasoning to understand why. No more black-box debugging.

Step 5: Model Selection Guidance

Choosing the right model for your agent matters more than you might think. Here’s my rough guide based on what I’ve seen work well:

Model	Best For	Trade-off
gpt-4.1	General-purpose agents, complex reasoning	Higher cost, excellent quality
gpt-4.1-mini	Production agents with good quality/cost balance	Slightly less capable, much cheaper
gpt-4.1-nano	High-volume, simple task agents	Fast and cheap, less nuanced
o4-mini	Agents needing multi-step reasoning	Slower (thinking time), very capable
gpt-5-mini	Latest generation, strong reasoning	Best quality/cost ratio for new projects

The beauty of the Agent Service is that you can swap models without changing your agent code. Start with gpt-4.1-mini for development, upgrade to gpt-5-mini for production if you need better quality.

Step 6: Publish and Monitor

Once you’re happy with your agent, you can publish it to create a stable, versioned endpoint:

// Publish the agent (creates a stable endpoint)
var published = await projectClient.Agents.PublishAsync(
    agentName: agent.Name,
    agentVersion: agent.Version);

Published agents get:

Versioning: Every iteration is snapshotted. Roll back to any previous version.
Stable endpoints: A production URL that doesn’t change between versions.
Distribution: Share through Microsoft Teams, Microsoft 365 Copilot, or the Entra Agent Registry.

For monitoring, the Foundry portal provides service metrics dashboards showing agent run counts, response latency, tool invocation patterns, and error rates. You can set up alerts on these metrics through Azure Monitor, same as any other Azure resource.

Gotchas and Tips

A few things I’ve learned from building agents on this platform:

Be specific in your instructions. Vague instructions like “be helpful” lead to inconsistent behaviour. Tell the agent exactly what it should and shouldn’t do, what tools to prefer, and how to handle edge cases.
Test with diverse inputs. Agents can surprise you. Run a proper evaluation before going to production.
Watch your tool usage costs. Web search and code interpreter have their own pricing on top of model token costs. Monitor your tool invocation patterns.
Use tool_choice strategically. Setting tool_choice="required" forces the agent to use a tool, which is useful when you know a tool call is needed (like grounding on search data).

Wrapping Up

Building an agent on Microsoft Foundry has gone from a multi-day infrastructure exercise to something you can genuinely get running in an afternoon. The managed runtime, built-in tools, tracing with Application Insights, and production publishing workflow mean you can focus on your agent’s logic rather than plumbing.

As always, feel free to reach out with any questions or comments!

Until next time, stay cloudy!

Microsoft Foundry: Azure AI’s Biggest Platform Shift, What Changed and Why You Should Care

If you’ve been keeping score at home, the Azure AI platform has had more name changes than a witness protection program. Azure AI Studio became Azure AI Foundry at Ignite 2024, and then, barely six months later at Build 2025, Microsoft dropped the “Azure” entirely and rebranded the whole thing to Microsoft Foundry. I’ll admit my first reaction was “here we go again.” But this time it’s not just a name change. Microsoft has genuinely consolidated the platform underneath, and the result is the most significant simplification of the Azure AI developer experience since… well, since they first launched Azure OpenAI Service.

This post will walk through the rebrand lineage (because keeping track of the names alone can be daunting), what actually changed architecturally, what the migration path looks like for existing Azure OpenAI users, and what’s still hanging around in “Foundry Classic.” Let’s dive in!

The Name Game: A Brief History

Let me save you some confusion with a quick timeline:

Azure AI Studio (2023): The original portal for working with Azure OpenAI models, prompt engineering, and fine-tuning.
Azure AI Foundry (Ignite 2024, November): Rebranded and expanded to include hubs, projects, and a broader set of AI services beyond just OpenAI models.
Microsoft Foundry (Build 2025, May 19): The big one. Not just a rebrand but a genuine platform consolidation. Azure OpenAI, Azure AI Services, and the old hub-based project model all fold into a single unified resource.

The important bit is that last step. Microsoft didn’t just change the logo. They created a new resource type, unified the APIs, consolidated the SDKs, and redesigned how projects, endpoints, and access control work.

What Actually Changed (Architecturally)

Here’s the before-and-after, because this is where it gets genuinely interesting:

Dimension	Before	After
Resource model	Hub + Azure OpenAI resource + AI Services (separate resources)	Single Foundry resource with projects
Endpoints	5+ separate service endpoints	One project endpoint
API versioning	Monthly `api-version` query parameters	Stable v1 routes (`/openai/v1/`)
SDKs	Multiple packages (`Azure.AI.Inference`, `Azure.AI.Generative`, `Azure.AI.ML`, `AzureOpenAIClient`)	Unified `Azure.AI.Projects` + standard `OpenAI` NuGet package
Agent API	Assistants API (Threads, Runs, Messages)	Responses API (Conversations, Items, Responses)
Portal	Foundry Classic	New Foundry portal at ai.azure.com

In my opinion, the single biggest win here is the resource consolidation. Previously, building an agent that needed an OpenAI model, a search index, and content safety required three separate Azure resources, each with their own endpoint, RBAC, and networking config. Now it’s one Foundry resource with one project endpoint. The reduction in operational overhead is substantial.

The v1 API Routes: No More api-version Madness

If you’ve worked with Azure OpenAI, you know the pain of api-version query parameters. Every few months a new preview version drops, features move between versions, and you end up with code littered with version strings like 2024-08-01-preview that you’re never quite sure are current.

Microsoft Foundry introduces stable v1 routes. Instead of:

https://my-resource.openai.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2024-08-01-preview

You now use:

https://my-resource.openai.azure.com/openai/v1/responses

This means you can use the standard OpenAIResponseClient from the OpenAI NuGet package (not the Azure-specific AzureOpenAIClient wrapper) pointed directly at your Azure resource:

// Install packages:
// dotnet add package OpenAI
// dotnet add package Azure.Identity

using OpenAI;
using OpenAI.Responses;
using System.ClientModel.Primitives;
using Azure.Identity;

#pragma warning disable OPENAI001

BearerTokenPolicy tokenPolicy = new(
    new DefaultAzureCredential(),
    "https://ai.azure.com/.default");

OpenAIResponseClient client = new(
    model: "gpt-4.1",
    authenticationPolicy: tokenPolicy,
    options: new OpenAIClientOptions()
    {
        Endpoint = new Uri("https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1")
    });

OpenAIResponse response = await client.CreateResponseAsync("What is Microsoft Foundry?");
Console.WriteLine(response.GetOutputText());

This is brilliant for portability. The same client code works against OpenAI directly or Azure OpenAI with just an endpoint swap. No more Azure-specific SDK wrappers or version parameter juggling.

One-Click Upgrade from Azure OpenAI

For existing Azure OpenAI users (which, let’s be honest, is most of us), the migration story is pretty straight forward. Microsoft offers a one-click upgrade that converts your Azure OpenAI resource into a Foundry resource while preserving:

Your existing endpoint URL
All API keys
Deployed models and their configurations
Existing state and data

After the upgrade, you get a Foundry resource with a default project. Your old code keeps working because the endpoint is preserved. You can then start using the new v1 routes and the Responses API at your own pace.

Be warned: The upgrade is one-way. Once you convert to a Foundry resource, you can’t go back to a standalone Azure OpenAI resource. In my opinion, that’s fine since there’s no feature regression, but it’s worth knowing before you click the button.

The Unified SDK: Azure.AI.Projects

The old SDK story was, to put it politely, fragmented. Depending on what you were doing, you might need Azure.AI.Inference, Azure.AI.Generative, Azure.AI.ML, or the OpenAI package with the AzureOpenAIClient wrapper. Each had slightly different authentication patterns and endpoint requirements.

The Azure.AI.Projects SDK consolidates this into a single project client. You authenticate once, and you get access to models, agents, evaluations, tracing, and tools through one interface.

For inference specifically, you can use the standard OpenAIResponseClient against the v1 routes (as shown above). For platform operations like creating agents, running evaluations, or configuring tools, you use the AIProjectClient. Two NuGet packages, clear separation of concerns.

Foundry Classic: What Stays Behind

Not everything has moved to the new portal yet. Microsoft is maintaining the Foundry Classic portal for scenarios that aren’t yet supported in the new experience:

Standalone Azure OpenAI resources not connected to a Foundry project
Assistants v1 creation and authoring
Audio playground
AI service fine-tuning
Content Understanding (moving to new portal soon)
Hub-based project workflows

If you’re using any of these, you’ll need to keep working in Classic for now. The good news is that Classic isn’t going away imminently, and Microsoft is actively migrating features across. The roadmap is to eventually have everything in the new portal.

What’s GA vs Preview

This is important for production workloads. At the time of writing, here’s the high-level readiness:

GA (production-ready): Model discovery and deployment, agent development (Agents v2), evaluations, fine-tuning, red teaming, RBAC, quota management, speech playgrounds
Preview (not yet production-ready): Workflows, tracing, monitoring, memory, guardrails for agents, knowledge, AI Gateway

If you’re running production workloads, I’d strongly recommend sticking to GA features and using the Foundry GA guide to validate your feature requirements before committing.

Who Is This Actually For?

Microsoft positions Foundry for three audiences, and I think this framing is spot on:

Application developers building AI-powered products with agents, models, and tools. This is the primary audience, and the portal’s “Discover, Build, Operate” flow reflects this.
ML engineers and data scientists who fine-tune models, run evaluations, and manage deployments.
Platform engineers and IT admins who need to govern AI resources, enforce policies, and manage access across teams.

If you’re in that first group (and I suspect most readers of this blog are), the simplification from multiple resources and SDKs down to one Foundry resource and two packages is a genuine quality-of-life improvement.

Wrapping Up

Microsoft Foundry is not just another rebrand. It’s a genuine architectural consolidation that simplifies how we build, deploy, and manage AI applications on Azure. The unified resource model, stable v1 API routes, consolidated SDK, and one-click upgrade path all point to a platform that’s maturing rapidly.

Is it perfect? No. There are still features stuck in Classic, preview capabilities that aren’t production-ready, and the name changes have been genuinely confusing. But the direction is right, and for anyone starting fresh or planning a migration, Foundry is clearly where Microsoft wants you to be.

As always, feel free to reach out with any questions or comments!

Until next time, stay cloudy!

The Responses API: Azure OpenAI’s New Core Interface, and Why It Replaces Everything You Know

Recently I was knee-deep in refactoring an agentic workflow that had become, frankly, a bit of a Frankenstein’s monster. Half the logic was on Chat Completions with manually managed message arrays, the other half was bolted onto the Assistants API with Threads and Runs. State management was scattered across three different services, and I’d lost track of which conversation context lived where. Sound familiar?

Then Microsoft dropped the Responses API in the March 2025 release, and I’ll be honest, my first reaction was “not another API surface.” But after spending a weekend migrating that messy codebase, I’m a convert. This is, without a doubt, the most architecturally significant API change Azure OpenAI has shipped to date.

This post will cover what the Responses API actually is, how it differs from Chat Completions and the Assistants API, what the new built-in tools look like in practice, and how to start migrating. Let’s dive in!

So What Actually Is the Responses API?

In plain English, the Responses API is a single, unified interface that takes the conversational simplicity of Chat Completions and combines it with the stateful, tool-rich capabilities of the Assistants API. You get server-side state management, built-in tools, and multi-turn conversation chaining, all without needing to wrangle Threads, Runs, or separate API surfaces.

The key idea is that every call returns a response object with a unique ID. You can chain conversations together by passing previous_response_id from one call to the next. The server handles the context window for you. No more manually appending message arrays and praying you haven’t blown your token limit.

Here’s what a basic call looks like:

// Install packages:
// dotnet add package OpenAI
// dotnet add package Azure.Identity

using OpenAI;
using OpenAI.Responses;
using System.ClientModel.Primitives;
using Azure.Identity;

#pragma warning disable OPENAI001

BearerTokenPolicy tokenPolicy = new(
    new DefaultAzureCredential(),
    "https://ai.azure.com/.default");

OpenAIResponseClient client = new(
    model: "gpt-4.1-nano",
    authenticationPolicy: tokenPolicy,
    options: new OpenAIClientOptions()
    {
        Endpoint = new Uri("https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1")
    });

OpenAIResponse response = await client.CreateResponseAsync("What's the capital of Australia?");
Console.WriteLine(response.GetOutputText());

If you’ve been using Chat Completions, you’ll notice a few things straight away. The endpoint is /openai/v1/responses (no more api-version query parameters on the v1 routes). The input is input rather than messages. And the output is a typed response object rather than a choices array. Pretty straight forward, right?

Chaining Conversations (Without the Headache)

The real magic is in multi-turn conversations. With Chat Completions, you had to manually track and resend the entire message history on every call. With the Responses API, you just pass the previous response ID:

// Initial request
OpenAIResponse response = await client.CreateResponseAsync("Explain quantum computing to me.");

// Follow up, the model has full context of the previous exchange
OpenAIResponse secondResponse = await client.CreateResponseAsync(
    "Now explain that like I'm five.",
    new ResponseCreationOptions()
    {
        PreviousResponseId = response.Id
    });

Console.WriteLine(secondResponse.GetOutputText());

The server retains the conversation state for 30 days by default. No more client-side arrays growing unbounded. No more accidentally dropping system messages when you truncate for token limits. The server just handles it.

Built-in Tools: Web Search, Code Interpreter, MCP, and More

This is where it gets genuinely exciting. The Responses API ships with a suite of built-in tools that previously required either the Assistants API or custom middleware:

Web Search via Bing grounding, so your model can fetch real-time information
Code Interpreter for running Python in a sandboxed container
File Search for document retrieval
Image Generation with gpt-image-1
Remote MCP servers, connecting to the broader Model Context Protocol ecosystem

Here’s web search in action:

ResponseCreationOptions options = new();
options.Tools.Add(ResponseTool.CreateWebSearchTool());

OpenAIResponse response = await client.CreateResponseAsync(
    "What were the key announcements at Microsoft Build 2025?", options);

Console.WriteLine(response.GetOutputText());

That’s it. One parameter. The model decides when to search, executes the search, and synthesises the results into its response. No Bing API keys, no custom function calling plumbing, no stitching results together yourself.

And for MCP, you can point the model at any remote MCP server:

ResponseCreationOptions mcpOptions = new();
mcpOptions.Tools.Add(ResponseTool.CreateMcpTool(
    serverLabel: "my_tools",
    serverUri: new Uri("https://my-mcp-server.example.com"),
    toolCallApprovalPolicy: new McpToolCallApprovalPolicy(
        GlobalMcpToolCallApprovalPolicy.NeverRequireApproval)
));

OpenAIResponse response = await client.CreateResponseAsync(
    "Search for information about Azure Functions", mcpOptions);

This is a big deal. It means your Azure OpenAI models can now natively consume tools from the entire MCP ecosystem, the same protocol used by Claude, GitHub Copilot, and a growing number of developer tools.

Context Compaction: Managing Long Conversations

One of my favourite features (and one that would have saved me a lot of pain in that Frankenstein codebase) is context compaction. When your conversation history grows large, the API can automatically summarise earlier turns to fit within the model’s context window.

You can enable server-side compaction like this:

ResponseCreationOptions compactionOptions = new()
{
    Input = conversation,
};
compactionOptions.ContextManagement.Add(new ResponseContextManagement()
{
    Type = "compaction",
    CompactThreshold = 200000
});

OpenAIResponse response = await client.CreateResponseAsync(compactionOptions);

When the token count crosses your threshold, the service automatically compacts the context and emits an encrypted compaction item. You don’t need to do anything special; just keep appending output items to your input array as normal.

For those of you running long-lived agent loops or multi-step workflows, this is a game-changer. No more writing custom summarisation logic or awkwardly truncating message histories.

Background Tasks for Long-Running Workflows

Need to kick off a complex reasoning task that might take a few minutes? The Responses API supports background mode:

ResponseCreationOptions backgroundOptions = new()
{
    AllowBackgroundExecution = true
};

OpenAIResponse response = await client.CreateResponseAsync(
    "Analyse this codebase and suggest architectural improvements.",
    backgroundOptions);

// Poll for completion
while (response.Status == "queued" || response.Status == "in_progress")
{
    await Task.Delay(2000);
    response = await client.GetResponseAsync(response.Id);
}

Console.WriteLine(response.GetOutputText());

This is particularly useful with reasoning models like o3 and o4-mini, where complex tasks can take longer than a typical HTTP timeout allows. You fire the request, go make a coffee, and poll for the result.

So What Happens to Chat Completions?

Good news: Chat Completions isn’t going anywhere. Microsoft has been clear that it remains a supported, stable API for straightforward text generation. But here’s the honest truth: every new feature is landing on the Responses API first. Built-in tools, MCP, compaction, background tasks, computer use… none of these are coming to Chat Completions.

In my opinion, if you’re starting a new project today, use the Responses API. If you have an existing Chat Completions integration that’s working well, there’s no rush to migrate. But if you’re building anything agentic, anything with tools, or anything that needs multi-turn state management, the Responses API is the clear choice.

Here’s a quick comparison to help you decide:

Capability	Chat Completions	Responses API
Basic text generation	Yes	Yes
Multi-turn state management	Manual (client-side)	Automatic (server-side)
Built-in web search	No	Yes
Code interpreter	No (Assistants only)	Yes
Remote MCP servers	No	Yes
Context compaction	No	Yes
Background async tasks	No	Yes
Computer use	No	Yes
Function calling	Yes	Yes

Getting Started

Migration is pretty straight forward. The key changes are:

Endpoint: Move from /chat/completions (with api-version params) to /openai/v1/responses
Client: Use the OpenAIResponseClient with a BearerTokenPolicy pointed at your Azure resource’s /openai/v1 endpoint
Input: Replace messages arrays with input (and optionally instructions for system-level guidance)
Output: Read from response.GetOutputText() instead of response.Choices[0].Message.Content
Multi-turn: Replace manual message history with PreviousResponseId
Function calling: Schemas are now strict by default (which is honestly better practice anyway)

The official migration guide from OpenAI has the full details, and the Azure-specific how-to guide covers the Azure endpoints and authentication patterns.

Note: The Responses API is available in Australia East (among many other regions), so those of us down under get local latency. Always a welcome bonus.

Wrapping Up

The Responses API is the most significant API evolution Azure OpenAI has shipped. It simplifies state management, brings powerful built-in tools to every developer, and opens the door to the MCP ecosystem. If you’ve been juggling Chat Completions and the Assistants API like I was, the unification alone is worth the migration effort. And with features like context compaction and background tasks, it’s clearly built for the agentic workloads that are defining 2025.

As always, feel free to reach out with any questions or comments!

Until next time, stay cloudy!

Azure Spring Clean – 5 Tips to help you align to Enterprise Scale

Azure Spring clean. Easily one of my favourite Azure events of each year. I spend a lot of my year helping organisations clean up their Azure tenancies, so even though I’m writing this as Australia enters autumn, I’m super pumped to take you through my contribution for 2022. 5 Tips for how you can start your own Enterprise Scale journey, today.

For those who haven’t heard of Enterprise Scale Landing Zones (ES) before – It’s a bloody straight forward concept. Microsoft has developed several Azure best practices through the years, with these being reflected in the Cloud Adoption and Well Architected Frameworks. Enterprise Scale is guidance on how best to use these techniques in your environment.

This article will take you through five tips for customers who already have an Azure deployment, albeit not really aligned to the ES reference architectures. Microsoft also provides guidance on this process here. Let’s dive right in!

1. Understand the right reference architecture for you!

While Enterprise Scale (ES) is generic in implementation, every organisation is unique. As such, Microsoft has provided multiple options for organisations considering ES. Factors such as your size, growth plans or team structure will all influence your design choices. The first tip is pretty simple – Understand where you currently are, compared to the available architectures.

The four reference architectures that Microsoft provides for ES are:

Each Enterprise Scale pattern builds in capability

Note: The ES reference architectures that Microsoft provides here aren’t the only options; Cloud Adoption Framework clearly allows for “Partner Led” implementations which are often similar or a little more opinionated. Shameless Plug 😉 Arinco does this with our Azure Done Right offering.

2. Implement Management Groups & Azure Policy

Once you have selected a reference architecture, you then need to begin aligning. This can be challenging, as you’re more than likely already using Azure in anger. As such you want to make a change with minimal effort, but a high return on investment. Management Groups & Policy are without a doubt the clear winner here, even for single subscription deployments.

Starting simple with Management groups is pretty easy, and allows you to segment subscriptions as you grow and align. Importantly, Management Groups will help you to target Azure Policy deployments.

A simple structure here is all you need to get going, Production/Development as an easy line to draw, but it’s really up to you. In the below plan, I’ve segmented Prod and Dev, Platform and Landing Zone and finally individual products. Use your own judgement as required. A word from the wise; Don’t go too crazy, you can continue to segregate with subscriptions and resource groups.

Once you’ve set up Management Groups, it’s time to limit any future re-work and minimise effort for changes. Azure Policy is perfect for this, and you should create a Policy initiative which enforces your standards quickly. Some examples of where you might apply policy are;

If you haven’t spent much time with Azure Policy, the AWESOME-Azure-Policy repository maintained by Jesse Loudon has become an amazing source for anything you would want to know here!

3. Develop repeatable landing zones to grow in.

The third tip I have is probably the most important for existing deployments. Most commonly, non ES organisations operate in a few monolithic subscriptions, sometimes with a few resource groups to separate workloads. In the same way that microservices allow development teams to iterate on applications faster, Landing Zones allow you to develop capability on Azure faster.

A Landing Zone design is always slightly different by organisation, depending on what Azure architecture you selected and your business requirements.

Some things to keep in mind for your LZ design pattern are:

How will you network each LZ?
What security and monitoring settings are you deploying?
How will you segment resources in a LZ? Single Resource Group or Multiple?
What cost controls do you need to apply?
What applications will be deployed into each LZ?

There’s one common consideration on the above list that I’ve intentionally left off the above list;

How will you deploy a LZ?

The answer for this should be Always as Code. Using ARM Templates, Bicep, Terraform, Pulumi or any IaC allows you to quickly deploy a new LZ in a standardised pattern. Microsoft provides some excellent reference ARM templates here or Terraform here to demonstrate exactly this process!

4. Uplift security with Privileged Identity Management (PIM)

I love PIM. It’s without a doubt, my favourite service on Azure. If you haven’t heard of PIM before (how?), PIM focuses on applying approved administrative access within a time-boxed period. This works by automatically removing administrative access when not required, and requiring approval with strong authentication to re-activate the access. You can’t abuse an administrator account that has no admin privileges.

While the Enterprise Scale documentation doesn’t harp on the benefits of PIM, the IAM documentation makes it clear that you should be considering your design choices and that’s why using PIM is my fourth tip.

I won’t deep dive into the process of using PIM, the 8 steps you need here are already documented. What I will say is, spend the time to onboard each of your newly minted landing zones, and then begin to align your existing subscriptions. This process will give you a decent baseline of access which you can compare to when minimising ongoing production access.

5. Minimise cost by sharing platform services

Cost is always something to be conscious of when operating on any cloud provider and my final tip focuses on the hip pocket for that reason. Once you are factoring in things like reserved instances, right sizing or charge back models into your landing zones, this final tip is something which can really allow you to eek the most out of a limited cloud spend. That being said, this tip also requires a high degree of maturity within your operating model you must have a strong understanding of how your teams are operating and deploying to Azure.

Within Azure, there is a core set of services which provide a base capability you can deploy on top of. Key items which come to mind here are:

AKS Clusters
App Service Plans
API Management instances
Application Gateways

Once you have a decent landing zone model and Enterprise Scale alignment, now you can begin to share certain services. Take the below diagram as an example. Rather than build a single plan per app service or function, a dedicated plan helps to reduce the operating cost of all the resources. In the same way, a platform team might use the APIM DevOps Toolkit to provide a shared APIM instance.

Note that multiple different functions are using the same app service plan here.

Considering this capability model when you develop your alignment is an easy way which you can minimise work required to move resources to a new Enterprise Scale deployment. In my opinion, consolidating Kubernetes pods or APIM API’s is a lot easier than moving clusters or Azure resources between landing zones.

Note: While technically possible, try to avoid sharing IaaS virtual machines. This does save cost, but encourages using the most expensive Azure compute. You want to push engineering teams towards cheaper and easier PaaS capabilities where possible.

Final Thoughts

Hopefully you have found some value in this post and my tips for Enterprise Scale alignment. I’m really looking forward to seeing some of the community generated content. Until next time, stay cloudy!

GitHub Advanced Security – Exporting results using the Rest API

Recently while working on a code uplift project with a customer, I wanted a simple way to analyse our Advanced Security results. While the Github UI provides easy methods to do basic analysis and prioritisation, we wanted to complete our reporting and detailed planning off platform. This post will cover the basic steps we followed to export GitHub Advanced Security results to a readable format!

Available Advanced Security API Endpoints

GitHub provides a few API endpoints for Code Scanning which are important for this process, with the following used today:

This post will use PowerShell as our primary export tool, but reading the GitHub documentation carefully should get you going in your language or tool of choice!

Required Authorisation

As a rule, all GitHub API calls should be authenticated. While you can implement a GitHub application for this process, the easiest way is to use an authorised Personal Access Token (PAT) for each API call.

To do create a PAT, navigate to your account settings, and then to Developer Settings and Personal Access Tokens. Exporting Advanced Security results requires the security_events scope, shown below.

The PAT scope required to export Advanced Security results

Note: Organisations which enforce SSO will require a secondary step where you log into your identity provider, like so:

Now that we have a PAT, we need to build the basic authorisation API headers as per the GitHub documentation.

  $GITHUB_USERNAME = "james-westall_demo-org"
  $GITHUB_ACCESS_TOKEN = "supersecurepersonalaccesstoken"
  
 
  $credential = "${GITHUB_USERNAME}:${GITHUB_ACCESS_TOKEN}"
  $bytes = [System.Text.Encoding]::ASCII.GetBytes($credential)
  $base64 = [System.Convert]::ToBase64String($bytes)
  $basicAuthValue = "Basic $base64"
  $headers = @{ Authorization = $basicAuthValue }

Exporting Advanced Security results for a single repository

Once we have an appropriately configured auth header, calling the API to retreive results is really simple! Set your values for API endpoint, organisation and repo and you’re ready to go!

  $HOST_NAME = "api.github.com"
  $GITHUB_OWNER = "demo-org"
  $GITHUB_REPO = "demo-repo"

  $response = Invoke-RestMethod -FollowRelLink -Method Get -UseBasicParsing -Headers $headers -Uri https://$HOST_NAME/repos/$GITHUB_OWNER/$GITHUB_REPO/code-scanning/alerts

  $finalResult += $response | %{$_}

The above code is pretty straight forward, with the URL being built by providing the “owner” and repo name. One thing we found a little unclear in the doco was who the owner is. For a personal public repo this is obvious, but for our Github EMU deployment we had to set this as the organisation instead of the creating user.
Once we have a URI, we call the API endpoint with our auth headers for a standard REST response. Finally, we parse the result to a nicer object format (due to the way Invoke-RestMethod -FollowRelLink parameter works).

The outcome we quickly achieve using the above is a PowerShell object which can be exported to parsable JSON or CSV formats!

Exported Advanced Security Results — Once you have a PowerShell Object, this can be exported to a tool of your choice

Exporting Advanced Security results for an entire organisation

Depending on the scope of your analysis, you might want to export all the results for your GitHub organisation – This is possible, however it does require elevated access, being that your account is an administrator or security administrator for the org.

  $HOST_NAME = "api.github.com"
  $GITHUB_ORG = "demo-org"

  $response = Invoke-RestMethod -FollowRelLink -Method Get -UseBasicParsing -Headers $headers -Uri https://$HOST_NAME/orgs/$GITHUB_ORG/code-scanning/alerts

  $finalResult += $response | %{$_}

Securing Privileged Access with Azure AD (Part 3) – Hybrid Scenarios

While many organisations are well on the journey to exclusively operating in the cloud, the reality is that most companies operate in a hybrid state for an extended period of time. As such, we cannot always apply all of our Privileged Access effort on securing the only the cloud. In this post, I’ll walk you through three simple methods which allow you to extend Azure AD capability into an on-premise environment, with the support of key “legacy” technology. If you’re just joining us for this series, head over to part one to learn about strategy, or part two for Azure AD Basics!

1. Reducing privileged access on premise with PIM

One of the challenges that many organisations perceive with PIM, is that it doesn’t extend to on-premise services. This perception is wrong – Yes, PIM itself doesn’t have native capability for on-premise, but it is extremely simple to consume PIM groups within an on premise environment. This can be done in two ways.

Custom group write-back using Microsoft Identity Manager

2. Automation write-back using a script, automation account or logic-app.

Both of these options require a pragmatic approach to deployment tradeoffs. For MIM group write-back, precise time bound access doesn’t really work. MIM generally syncs on a pre defined schedule, so you would need to configure PIM lifespans to cater for this, leaving some wriggle room on either side of the PIM window. Some companies prefer not to run custom built integration, so scripts which do the sync on our own schedule are avoided.

Thankfully, the community has put some excellent effort into this space, with by far the best example of this being the goodworkaround write-back script.

Sync Privileged Access from Azure to Active Directory with custom scripts. — Visualisation of the Hybrid scenario. Source: github.com

2. Forcing MFA for administrative access using Windows Admin Center

Regardless of how you choose to manage group membership for administrative access, sometimes the simplest security control you can apply to access is the best. MFA is by far, the most effective control you can apply to admin logins.

But how to achieve this? Unfortunately, Windows Server still doesn’t include native support for Azure AD MFA inside the RDP UI (Some secondary products like Duo or Okta have solutions for this). Sure this is a bit of a bummer, but let’s be honest; Direct RDP access to a server should NOT be required in the majority of scenarios. This is for two reasons;

Infrastructure as Code – If you’re able to configure a server to be replaced by a pipeline, you should. Maintenance and incident remediation is a lot easier when you can simply replace the infrastructure at the click of a button, without ever logging in.
Remote shell – You can do pretty much anything from the command line or PowerShell these days. In my opinion, RDP by default isn’t worth the security hassle. Restrict RDP usage and move to the CLI.

If you’re not comfortable in this space, or would just like an excellent solution which lets you monitor and configure multiple servers, Microsoft provides a world class solution for remote management, Windows Admin Center (WAC). In my opinion, this is highly under-utilised and a great addition to any IT Pros toolkit.

Thankfully, Windows Admin Center has native support for Azure AD authentication. Using Conditional access, you can then apply MFA to admin access.

Managing server Privileged Access with Windows Admin Centre

Configuring this within WAC is a straight forward task, with the settings for Azure AD Authentication available to configure under the “Settings > Access” blade:

Once enabled, you will be able to locate an Admin Center application within your Azure AD Tenant, which you can utilise to scope a targeted Conditional Access policy.

For this capability to truly be effective, you can also combine the WAC solution with an RD Gateway for RDP scenarios. Because RD Gateways operate using a Connection Authorisation Policy with NPS, you can quickly apply MFA to user sessions with the NPS extension. Be warned, this does add a small configuration overhead and occasionally a “double auth” scenario.

3. Extending Azure AD to networking infrastructure using SSO Integration or Network Policy Server

A lot of focus is generally exerted by IT teams on securing server infrastructure. But what about the network? As discussed in our strategy post, a holistic approach to privileged access includes all the solutions you manage. As the network carries all traffic for your solutions, some security practitioners will argue that securing this access is more important than securing the infrastructure!

Networking infrastructure being so diverse, you could generally enhance network privileged access security in two distinct manners.

Integrate Azure AD to centralised control plane. This will require standardisation of access through a network vendor solution.
Integrate networking devices to AAD via Radius. This requires support of specific radius protocols on your network devices.

Our first option in my opinion is the best one. Nearly every networking vendor these days provides a secure access control mechanism, Cisco has Identity Service Engine, Aruba uses ClearPass, Palo Alto uses Panorama, the list goes on for miles. Because these native tools integrate directly with access control for your networking appliances it can be an extremely quick win to apply SSO via Azure AD and MFA via Conditional Access. You can then combine this with services like Privileged Identity Management (PIM) to manage access through approval flows and group claims. Each of your networking vendors will provide documentation for this:

The second option works in privileged access scenarios where you don’t have a centralised identity service. Provided you can use the correct radius protocols, admins can configure the Azure MFA extension for NPS, with radius integration enabling MFA for your networking kit! In the below example, I use this to apply MFA to a SSH management interface for a Palo Alto firewall.

Managing Privileged Access for SSH using Radius and the MFA Extension

Up Next

Using the above three techniques, you very quickly end up with a potential architecture that might look like this.

Thanks for sticking with me through this weeks guidance on Hybrid Scenarios If you’re after some more Privileged Access information, have a read of my AAD Basics guidance, or stay tuned for more info on what can be done using Azure AD, including some tips and techniques to help you stay in control. Topics this series is covering are:

Strategy & Planning
Azure AD Basics
Hybrid Scenarios (This Post)
Zero Trust
Protecting Identity
Staying in Control

Until next time, stay cloudy!

Originally Posted on arinco.com.au

Easy management of Github Wikis with Actions

Recently Arinco has made an internal move to GitHub Enterprise for some of our code storage. For the most part, this has been a seamless process. All of our code is agnostic, and we support customers using both Azure DevOps and GitHub already. While supporting this move, some consideration was made for how best to manage documentation – We’ve found the Azure DevOps wiki feature to be extremely useful. It provides a suitable UI for business users to modify documentation, while also enabling developer friendly markdown. Github provides a similar capability using its own wiki feature.

On investigating the process for wiki usage within GitHub, we noticed an interesting difference to Azure DevOps – GitHub stores wiki files in a separate repo. This can be quickly seen when you navigate to the wiki tab and are presented with a second git URL to clone.

Our extra GitHub wiki repository — Another repo to manage? No thanks.

Now while this works in the same manner as Azure DevOps for developer/business scenarios, managing two repositories is annoying. Git does support adding the wiki as a submodule, however developers are required to complete a double commit and to some, the submodule UI on GitHub is a bit clunky.

To solve this challenge, we turned to the community, specifically looking for a pre-canned GitHub Action.
Thankfully this isn’t a new complaint from the community and SwiftDoc had already created an action. After setting up a PAT and running couple of tests with this, we found some behaviour annoying on the developer side. Specifically that files are not deleted, only created and directory structure is not preserved. And so, we have a slightly modified action:


name: Update Wiki
on:
  workflow_dispatch:
  push:
    branches: [ main ]
    paths:
      - 'wiki/**'
jobs:
  Update-Wiki:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: update wiki data
        env:
          GH_PERSONAL_ACCESS_TOKEN: ${{ secrets.GH_PERSONAL_ACCESS_TOKEN }}
          WIKI_DIR: wiki
        run: |
          echo $GITHUB_ACTOR
          WIKI_COMMIT_MESSAGE='Automatically publish wiki'
          GIT_REPOSITORY_URL="https://${GH_PERSONAL_ACCESS_TOKEN}@${GITHUB_SERVER_URL#https://}/$GITHUB_REPOSITORY.wiki.git"
          echo "Checking out wiki repository"
          tmp_dir=$(mktemp -d -t ci-XXXXXXXXXX)
          (
              cd "$tmp_dir" || exit 1
              git init
              git config user.name $GITHUB_ACTOR
              git config user.email $GITHUB_ACTOR@users.noreply.github.com
              git pull "$GIT_REPOSITORY_URL"
              echo "Removing Files, ensuring deletion."
              rm -r -f ./*
          ) || exit 1

          echo "Copying contents of $WIKI_DIR"
          cp -R $WIKI_DIR "$tmp_dir"
          
          echo "Committing and pushing changes"
          (
              cd "$tmp_dir" || exit 1
              git add .
              git commit -m "$WIKI_COMMIT_MESSAGE"
              git push --set-upstream "$GIT_REPOSITORY_URL" master
          ) || exit 1

          rm -rf "$tmp_dir"

This action doesn’t really cater as well for the business/developer split (files created in the GUI will be deleted), but for us, this works just fine and isn’t annoying. Until next time, stay cloudy!

Enabling Password-less in Azure AD with Feitian security keys!

Recently I was lucky to receive some evaluation security keys from Feitian – One of the select companies currently providing Microsoft tested FIDO2 hardware. I’ve always been passionate about enabling Windows Hello for business, so the chance to get even more password-less was something I leaped at.

If you haven’t used a FIDO key before, the first things you will want to know are, what are they? how do I enable usage and how do I use them? Thankfully, the answer to these questions is pretty simple.

My new FEITIAN K40 and K26 security keys

What is a FIDO2 Key?

Most people reading this post will currently maintain some form of password. The key detail being, this is generally a single password, maybe with a few permutations (hats off to the 1 in every 5 people using a password manager). These passwords are never very good – Hard to remember, simple to steal, easy to brute force and allowing massive re-use when stolen. FIDO is a solution to this nightmare.

FIDO as a concept is pretty easy to understand – You own a cryptographically verifiable key and this is used to authenticate on your services. Because FIDO allows you to physically own something with your security info, organisations can generate long & complex data, without you having to memorise it. Most importantly, this security data stays with you (sometimes even locked by your biometrics). Possession is nine tenths of the law and using FIDO it is much harder for malicious entities or hackers to break into your accounts.

As a protocol, FIDO has a fair bit of minutiae to understand. Microsoft provides an excellent summary within their existing password-less documentation. If you really enjoy long boring technical documents, the Technical Specs for FIDO2 from the FIDO alliance and W3C can be found here.

Enabling security key usage in Azure AD

Enabling authentication within Azure AD is a pretty straight forward process. If you have an older AAD Tenant, you want to make sure that Combined Security registration is enabled. (On by default from August 15th 2020)

To do so, navigate to Azure AD user settings, and ensure the combined option is set to all users.

Enable the combined security info experience for users

Next to enable security keys, navigate to Security > Authentication Methods

By selecting FIDO2 Security Key you can enable this authentication method for a select group or all users. There isn’t any major penalty for enabling this on all users, however if completing this task under a dedicated rollout, you may want to consider who should have a key OR if you wanted to configure allowed AAGUIDs using data provided from your manufacturer.

Setting up and using a key

Now that we’ve enabled the service, it’s time for the part we are all keen for – actually using a key! To do so, plug your security key into your pc. Next, I would recommend installing the relevant configuration software for your device. This allows you to configure any settings available in your key.

In the case of my FEITIAN K26 key, I have an option to configure an extra biometric – My Fingerprint. This is great, as I’m now protecting the access that this key grants with my unique properties!

Once you’ve configured your key settings, it’s time to connect this to Azure AD. Complete a login into the MyApps portal. From here, you can use the security settings page to add the FIDO key for use.

The Security Info Page – Select Add Method, then Security Key

Follow the Bouncing Ball to configure your key.

Once setup, your next login should have the option to login with a security key!

A few weeks with my keys

After spending some time using these security keys, I’m thoroughly enjoying the change. They simplify login and provide me with a degree of confidence in my account security. As far as product feedback goes, I have had no issues with the FEITIAN keys – Just personal nigglings which will vary between users. The build quality is great, with the inbuilt loops letting me easily attach to my keys for on the go use. I accepted two USB C devices, which I surprisingly found challenging. As a Mac user, Apple has pushed a lot of my devices to USB-C. I thought I was all done with USB 2.0, but didn’t really think of my corporate devices, meaning I wasn’t able to use the keys there. Form factor wise, the devices could be a bit smaller, with the larger keys being a little bit concerning when moving my laptop around. I was really worried I would snap one off. FEITIAN offers the K21, K28 keys with a slimline build, so next time I might grab a pair of those!

A big Thank you to Della @ FEITIAN for the opportunity to test these keys, until next time – Stay cloudy!

Securing Privileged Access with Azure AD (Part 2) – The AAD Basics

Welcome back to my series on securing privileged access. In this post, I’m going to walk you through five basic concepts which will allow you to keep your identity secure when using Azure AD. If you missed part one on building your PAM strategy, head over here to learn about the rationale and mentality behind securing privileged access and why it should be part of your cybersecurity strategy.

1. Azure AD Groups

This might seem a bit simple, but if you’re not using group assignments wherever possible, you’re doing something wrong. Assigning applications, roles, resources and service ownership to a group makes everything easier when building a privileged access deployment. If you’re starting out, this is fairly easy to implement. If you’re already using Azure AD, an afternoon is all you need to convert the majority of role assignments to groups for Azure AD (Your Milage May Vary for Azure IAM!).

When Assigning, develop role and access groups with the following mantra in your mind

Mutually Exclusive, Collectively Exhaustive. (MECE)

This mantra will help you to nest groups together, in a fashion that ensures your administrators have access to all the services they need. Take a help desk admin as an example. Assign a group to Helpdesk Administrator, Global Reader and Teams Communications Support Engineer. Nest the “Helpdesk Admin Users” within each . As separate access assignments, these access groups are mutually exclusive. Once nested to a group, these become collectively exhaustive. As an added benefit, applying the above MECE process to role group assignment will make some Identity Governance activities like Segregation of Duty easier!

Make the new group eligible for privileged access assignment — Assigning Privileged Access to Azure AD Groups requires you to enable role assignment on creation

Pro Tip: Dynamic Groups are a great way to grant low privileged access to business services and minimise operational overhead. However, you need to be aware of lateral movement paths – If users can edit the attribute which the dynamic access is tied to, that is a method which may allow users to bypass your identity design.

2. Conditional Access (CA)

Easily the most effective identity security control for organisations to implement is Multi Factor Authentication. Microsoft has made no secret of its opinion with regard to MFA, even touting that MFA prevents 99.9% of identity based attacks.

In it’s most simple form, a Conditional access rule applies a set of logic to each sign-in which occurs against Azure AD. Combine conditional access with ubiquitous integration to Azure AD and you can secure a large number of applications with a single control.

Conceptual Conditional Access process flow — Conditional Access is a great solution for securing Privileged Access

If you’re wanting the fastest conditional access setup ever, apply the Multi-Factor Authentication sign in control to All Users, for All Applications on every sign-in.

While this would technically work, I wouldn’t recommend this approach and the reason is simple – It degrades trust in your MFA setup. As security practitioners, we know that our users will slowly grow accustomed to an enforced behaviour. If you setup Conditional access to prompt for MFA frequently without a clear scenario, you will very quickly find that MFA is almost useless, as users select accept for every MFA prompt they see without thought or consideration. If you don’t have time to configure Conditional Access, enable the Azure AD Secure Defaults.

A better approach to Conditional Access is to define your scenarios. In the case of Privileged Access, you have a few critical scenarios where Conditional Access configurations should be applied. These are:

MFA Registration from outside your operating country. Block this. Hackers shouldn’t be able to enroll MFA tokens for breached accounts.
Login for Azure, Azure AD and integrated SaaS admin accounts. Require MFA and secure endpoints for all sessions.
High risk logins. Block all or most of these events. Require a password reset by another administrator.

3. Split administrative accounts

For the security aficionados reading this post, the “minimal blast radius” concept should be quite familiar. For those of you newer to security, this concept focuses on the idea that one small breach should be isolated by default and not cause one big breach.

The easiest way to do this for Privileged Access is to split up your key administrator accounts. One admin for Azure AD, one admin for Active Directory and one admin for your external SaaS applications. A prominent example of this control not being applied recently, was the Solorigate attacks against Solarwinds customers. In this attack chain, an on-premise breach was used to compromise cloud administrator accounts using forged ADFS tokens. With segregated admin accounts, this attack would have been reduced in impact – You can’t log into a cloud only global admin account with an ADFS token.

Microsoft recommends you separate admin accounts in hybrid environments

If you’re on the fence about this control because it may seem inconvenient for day to day operations, consider the following.

Good identity controls are automatic

As you spend more time investing into advanced identity capability, you will notice that operational overhead for identity should decrease. It might start out challenging, but over time you will rely less on highly privileged roles such as global administrator.

4. Configure and monitor break glass accounts

Setting up Privileged Access management is an important process, and perhaps one of the most critical step within this process is to have a plan for when things go wrong. It’s ok to admit it. Everyone makes mistakes. Services have outages or sometimes you just click the wrong button. A break glass account is your magical get out of jail card for these exact scenarios. If you don’t spend two minutes to set these up, you will definitely curse when you find them missing.

There is a couple things you should keep in mind when creating break glass accounts. Firstly, how will this access be stored and secured? Organisations may opt to vault credentials in a password manager, print passwords for physical storage in a safe, or have two “keepers” who each retain half of the password (nuclear launch code style). In my opinion, the best action for break glass credentials is to go password less. Spend the money and get yourself a FIDO2 compliant hardware key such as those from Yubico or Feitian. Store this hardware key somewhere safe and you’re home free – NO password management overhead and hyper secure sign in for these accounts.

The second thing to keep in mind for break glass accounts is: They should NOT be used. As these accounts are generic, tied to the business and not a user, there isn’t always a method to attribute actions that a break glass account takes to a specific employee. This is a challenge for insider threat scenarios. If all your administrators have access to the account, how are you to know who intentionally deleted all your files with the account when they had really bad day?

Securely storing credentials for a break glass account is the first method which you prevent this happening, but the second is to alert on usage. If your business process somehow fails and the credentials leak, you have a rapid prompt by which lets you know something may be going wrong.

5. Privileged Identity Management

Azure AD Privileged Identity Management, PIM for short, focuses on applying approved administrative access within a time-boxed period. This works by automatically removing administrative access when not required, and requiring approval with strong authentication to re-activate the access. You can’t abuse an administrator account that has no admin privileges.

The PIM Process. Source: Robert Przybylski

Good PIM implementations are generally backed by strong business process. At the end of the day, identity is a people centric technology. Sometimes real world process needs to be considered. The following tips should help you design a decent PIM implementation, keeping in mind your key stakeholders.

Be pragmatic about Eligible vs Permanently assigned roles. Your corporate risk profile may allow some roles to be active all the time.
Have multiple approvers for each role. What if someone has a day off? You don’t want to block the business because you haven’t got an approver available.
Consider the time it takes you to execute a common task. If Admins have tasks which take two hours, but need to re-activate a role every hour, you’re simply adding frustration to peoples days.
Build a data driven review process. PIM provides rich reporting on usage and activation of roles, so use this to remove or grant further access at a periodic interval.

Finally, Notice how the last item in this list is the only one that explicitly mentions privileged access in the name? This is because PIM provides the best benefit when used within a healthy and well-managed environment. In my opinion, taking the time to use your Azure AD P1 Features before paying extra for an Azure AD P2 feature is the best approach. Consider the Microsoft guidance and your own strategy before making that decision however.

Up Next

Thanks for sticking with me through this weeks guidance on Azure AD Basics If you’re after some more Privileged Access information, have a read of my strategy guidance, or stay tuned for more info on what can be done using Azure AD, including some tips and techniques to help you stay in control. Topics this series is covering are:

Strategy & Planning
Azure AD Basics (This Post)
Hybrid Scenarios
Zero Trust
Protecting Identity
Staying in Control

Until next time, stay cloudy!

Securing Privileged Access with Azure AD (Part 1) – Strategy and Planning

I’ve been approached a few times recently on how best to govern and secure Privileged Access using the Microsoft stack. Often this conversation is with organizations who don’t have the need, budget or skillset to deploy a dedicated solution such as those from CyberArk, BeyondTrust or Thycotic. Understanding this, these organizations are looking to uplift security, but are pragmatic about doing it within the ecosystem they know and can manage. This series will focus on getting the most of out Azure AD, challenging your thinking on Azure AD capabilities and using the Microsoft ecosystem to extend into hybrid environments!

What is Privileged Access?

Before we dive too deep into the topic, it’s important to understand what exactly is privileged access? Personally, I believe that a lot of organizations look at this in the wrong light. The simplest way to expand your understanding is by asking two questions.

If someone unauthorized to see or use my solution/data had the ability to do so, would the impact to my business be negative?
If the above occurred, how bad would it be?

The first question really focuses on the core of privileged access – It is a special right you grant your employees and partners, with the implicit trust it won’t be abused in a negative way. Using this question is good because it doesn’t just focus on administrative access – A pitfall which many organizations fall into. It also brings specialized access into scope. Question two is all about prioritizing the risk associated with each of your solutions – Understanding that intentional leakage of the organizational crown jewels is more important than someone who can access a server will often allow you to be pragmatic with your focus in the early stages of your journey.

Access diagram showing the split between privileged and user access. — This Microsoft visual shows how user access & privileged access often overlap.

Building a Strategy

Understanding your strategy for securing privileged access is a critical task and it should most definitely be distinct from any planning activities. Privileged access strategy is all about defining where to exert your effort over the course of your program. Having a short term work effort, aligned to a long term light on the hill ensures that your PAM project doesn’t revisit covered ground.

To do this well, start by building an understanding of where your capabilities exist. Something as simple as location is adequate. For example, I might begin with; Azure Melbourne, Azure Sydney, Canberra datacenter and Unknown (SaaS & everything else).

From that initial understanding, you can begin to build out some detail, aligned to services or data. If you have a CASB service like Cloud App Security enabled, this can be really good tool to gain insights on what is used within in your environment. Following this approach, our location based data suddenly expands to; Azure IaaS/PaaS resources, Azure Control Plane, SaaS application X, Data Platform (Storage Accounts) and Palo Alto Firewalls.

This list of services & data can then be used to build a list of access which users have against each service. For IaaS/PaaS and SaaS app X, we have standard users and administrators. ARM and Data platform overlaps for admin access, but data platform also has user access. Our networking admins have access to the Palo Alto devices, but this service is transparent to our users.

Finally, build a matrix of impact, using risks to the identity & likelihood of occurrence. Use this data to prioritize where you will exert your effort. For example; A breach of my SaaS administrator account for a region isn’t too dangerous, because I’ve applied a zero trust network architecture. You cannot access customer data or another region from the service in question. I’ll move that access down in my strategy. My users with access to extremely business sensitive data commonly click phishing emails. I’ll move that access up in my strategy.

How to gauge impact easily – Which version of the CEO would you be seeing, if this control of this privileged access was lost?
Source: Twitter

This exercise is really important, because we have begun to build our understanding of where the value is. Based on this, a short PAM strategy could be summarized into something like so;

Apply standard controls for all privileged users, decreasing the risk of account breach.
Manage administrative Accounts controlling identity, ensuring that access is appropriate, time bound and audited.
Manage user accounts with access to key data, ensuring that key access is appropriate, reviewed regularly and monitored for misuse.
Manage administrative Accounts controlling infrastructure with key data.
Apply advanced controls to all privileged users, enhancing the business process aligned to this access.
Manage administrative accounts with access to isolated company data (no access from service to services).

My overarching light on the hill for all of this could be summarized as: “Secure my assets, with a focus on business critical data enhancing the security of ALL assets in my organization”

Planning your Solutions

After you have developed your strategy, it’s important to build a plan on how to implement each strategic goal. This is really focused on each building block you want to apply and the technology choices you are going to make. Notice how the above strategy did not focus on how we were going to achieve each item. My favourite bit about this process is; Everything overlaps! Developing good controls in one area, will help secure another area, because identity controls generally cover all the user base!

The easiest way to plan solutions is to build out a controls matrix for each strategic goal. As an example,

Apply Standard Controls for all privileged users

Could very quickly be mapped out to the following basic controls:

Solution	Control	Purpose
Conditional Access	Multi-Factor Authentication	Works to prevent password spray, brute force and phishing attacks. High quality MFA design combined with attentive users can prevent 99.9% of identity based attacks.
Conditional Access	Sign In Geo Blocking	Administration should be completed only from our home country. Force this behaviour by blocking access from other locations.
Azure AD Password Protection	Password Policy	While we hope that our administrators don’t use Summer2021 as a password, We can sleep easy knowing this will be prevented by a technical control.

These control mappings can be as complex or as simple as needed. As a general recommendation, starting small will allow you to aggressively achieve high coverage early. From there you can re-cover the same area with deeper and advanced controls over time. Rinse and repeat this process for each of your strategic goals. You should quickly find that you have a solution for the entire strategy you developed!

Up Next

If you’ve stuck with me for this long, thank-you! Securing Privileged Access really is a critical process for any cyber security program . Hopefully you’re beginning to see some value in really expanding out a strategy and planning phase for your next privileged access management project. Over the next few posts, I’ll elaborate on what can be done using Azure AD, and some tips and techniques to help you stay in control. Topics we will cover are:

Strategy & Planning (This Post)
Azure AD Basics
Hybrid Scenarios
Zero Trust
Protecting Identity
Staying in Control

Until next time, stay cloudy!

Originally Posted on arinco.com.au