Agentic RAG on Azure: AI Search Knowledge Bases and Foundry IQ

Recently I was putting together a RAG proof of concept for a demo and set myself a deceptively simple requirement: “Ask one question and get an answer that pulls from a SharePoint knowledge base, product docs in blob storage, and the public web.” Except the question was something like “Compare our enterprise SLA response times against industry benchmarks, and flag contracts expiring in Q1 that have custom terms.” That’s not one question. That’s at least three.

My traditional RAG pipeline dutifully fired a single hybrid query at the index, got back chunks vaguely related to SLAs, and completely missed the contract expiry angle. I spent a week building custom query decomposition logic and a result merging step. Then at Ignite 2025, Microsoft announced that Azure AI Search now does all of this natively. Could have saved myself a week and a few grey hairs.

This post covers Azure AI Search’s new agentic retrieval pattern, how Foundry IQ plugs it into your agents, and why document-level access control makes it enterprise-ready. Let’s dive in!

Traditional RAG vs. Agentic RAG: In English Please?

Traditional RAG works like this: user asks a question, you embed it, run a vector (or hybrid) search, grab the top-k chunks, stuff them into a prompt, and send it to the LLM. One question, one query, one shot.

This works brilliantly for straightforward questions. “What’s our refund policy?” Nailed it. But the moment a user asks something compound, a single query falls apart. The embedding can’t capture all the facets. The top-k results skew towards one aspect of the question. And the LLM hallucinates the rest.

Agentic RAG flips the model. Instead of one query, an LLM sits inside the retrieval layer and:

Decomposes the complex question into focused sub-queries (e.g., “SLA response times for enterprise tier” + “industry SLA benchmarks” + “contracts expiring Q1 with custom terms”)
Executes all sub-queries in parallel across multiple knowledge sources
Reranks each result set using semantic ranking
Synthesises a unified response with citations back to source documents

The intelligence moves into the retrieval pipeline itself, not just the final generation step. Microsoft’s benchmarks show approximately 36% higher response quality compared to traditional single-shot RAG. In my experience with compound questions, the improvement is even more dramatic.

Knowledge Bases and Knowledge Sources: The Building Blocks

Azure AI Search introduced two new objects: knowledge bases and knowledge sources. These landed in the 2025-05-01-preview API and were significantly updated at Ignite 2025 with the 2025-11-01-preview.

A knowledge source is a connection to your data. You can connect to search indexes, Azure Blob Storage (including ADLS Gen2), OneLake, SharePoint (both indexed and remote modes), and web sources. For indexed sources, the platform automatically handles chunking, vector embedding generation, and metadata extraction. Point it at a container or site and it does the heavy lifting.

A knowledge base sits on top of one or more knowledge sources and orchestrates the agentic retrieval pipeline. It connects to an Azure OpenAI model (gpt-4o, gpt-4.1, or gpt-5 series) for query planning and controls retrieval behaviour.

Note: The terminology shifted during preview. “Knowledge agents” (August 2025) were renamed to “knowledge bases” (November 2025). If you built on the earlier preview, check the migration guide for breaking changes.

Here’s what creating a knowledge base looks like in C#:

using Azure;
using Azure.Identity;
using Azure.Search.Documents.Indexes;
using Azure.Search.Documents.Indexes.Models;

// Configure the LLM for query planning
var aoaiParams = new AzureOpenAIVectorizerParameters
{
    ResourceUrl = new Uri("https://your-foundry-resource.openai.azure.com"),
    DeploymentName = "gpt-4o",
    ModelName = "gpt-4o",
};

// Create the knowledge base
var knowledgeBase = new KnowledgeBase("enterprise-support-kb")
{
    Description = "Multi-source knowledge base for enterprise support queries.",
    RetrievalInstructions = "Use the product-docs source for technical questions, "
        + "the sharepoint source for internal policies, "
        + "and the web source for industry benchmarks.",
    AnswerInstructions = "Provide a concise answer with citations to source documents.",
    OutputMode = KnowledgeRetrievalOutputMode.AnswerSynthesis,
    KnowledgeSources =
    {
        new KnowledgeSourceReference("product-docs-ks"),
        new KnowledgeSourceReference("sharepoint-policies-ks"),
        new KnowledgeSourceReference("web-benchmarks-ks"),
    },
    Models = { new KnowledgeBaseAzureOpenAIModel(aoaiParams) },
    RetrievalReasoningEffort = KnowledgeRetrievalReasoningEffort.Low,
};

var credential = new DefaultAzureCredential();
var indexClient = new SearchIndexClient(
    new Uri("https://your-search-service.search.windows.net"),
    credential);

await indexClient.CreateOrUpdateKnowledgeBaseAsync(knowledgeBase);

A few things worth calling out. retrieval_instructions is natural language guidance to the query planner. output_mode is either EXTRACTIVE_DATA (raw ranked chunks) or ANSWER_SYNTHESIS (LLM generates a grounded answer with citations within the pipeline). And retrieval_reasoning_effort controls LLM processing depth: minimal (no LLM), low, or medium. Higher effort means better decomposition but more latency and token cost.

Querying: Where the Magic Happens

Once your knowledge base is set up, querying it is pretty straight forward. Send a conversation history and the pipeline does the rest:

using Azure.Identity;
using Azure.Search.Documents.KnowledgeBases;
using Azure.Search.Documents.KnowledgeBases.Models;

var credential = new DefaultAzureCredential();
var kbClient = new KnowledgeBaseRetrievalClient(
    new Uri("https://your-search-service.search.windows.net"),
    "enterprise-support-kb",
    credential);

var request = new KnowledgeBaseRetrievalRequest
{
    Messages =
    {
        new KnowledgeBaseMessage(KnowledgeBaseMessageRole.User)
        {
            Content =
            {
                new KnowledgeBaseMessageTextContent(
                    "Compare our enterprise SLA response times against "
                    + "industry benchmarks, and flag contracts expiring in "
                    + "Q1 that have custom terms.")
            }
        }
    },
    KnowledgeSourceParams =
    {
        new SearchIndexKnowledgeSourceParams("product-docs-ks")
        {
            IncludeReferences = true,
            IncludeReferenceSourceData = true,
        }
    },
    IncludeActivity = true,
};

var result = await kbClient.RetrieveAsync(request);
Console.WriteLine(result.Value.Response[0].Content[0].Text);

The response comes back in three parts, and this is one of my favourite design decisions. You get the response (raw grounding data or a synthesised answer), an activity array showing the full query plan (sub-queries generated, token counts, elapsed time per source), and a references array with links back to source documents. Full transparency, no black box.

In my opinion, this observability is what separates a production-ready RAG system from a demo. You can see exactly how the LLM decomposed your query, which sources it hit, and what it cost.

Foundry IQ: Connecting Agents to Knowledge

Foundry IQ is the integration layer that connects Foundry Agent Service agents directly to AI Search knowledge bases. Instead of writing custom tool code, you wire up the knowledge base as a tool and the agent handles the rest.

Each knowledge base automatically exposes an MCP endpoint:

https://<search-service>.search.windows.net/knowledgebases/<kb-name>/mcp?api-version=2025-11-01-preview

Because it speaks Model Context Protocol, any MCP-compatible client can connect: Foundry Agent Service, GitHub Copilot, Claude, Cursor, you name it. The knowledge base exposes a knowledge_base_retrieve tool that agents invoke like any other tool call.

The real benefit is separation of concerns. The agent focuses on reasoning and tool orchestration. The knowledge base handles retrieval: query decomposition, parallel execution, reranking, and citation extraction. Much cleaner than embedding retrieval logic directly in your agent code.

For teams already on Microsoft Foundry, setup is straightforward: in the Foundry portal, open the Knowledge tab, connect your search service, create a knowledge base, then flip to Agents and link it. The Foundry IQ blog post from the Azure AI Search team walks through the portal experience.

Document-Level Access Control: The Enterprise Requirement

None of this matters if everyone can see everything. The question isn’t just “can the agent find the right answer?” but “should this user see that answer?”

Document-level access control solves this by flowing permissions from data sources into the index and enforcing them at query time. As of November 2025 there are four approaches:

Security filters (stable): String-based matching on user/group IDs at query time
POSIX-like ACL / RBAC scopes (preview): Native ADLS Gen2 ACL support; pass the user’s Entra token via x-ms-query-source-authorization and results are automatically trimmed
SharePoint ACLs (preview): Permissions extracted directly from Microsoft 365 ACLs during indexing
Purview sensitivity labels (preview): Labels extracted from Purview and enforced at query time based on the user’s Entra token

Be warned: the SharePoint ACL and Purview features are preview-only via the 2025-11-01-preview REST API. Not production-ready yet, but the direction is clear: permissions follow your documents through indexing, retrieval, and agentic response generation.

For agentic retrieval, the Blob/ADLS Gen2 knowledge source supports ACL ingestion via ingestionPermissionOptions. When a Foundry agent queries on behalf of a user, results are automatically scoped to what that user is allowed to see.

Content Understanding Skill: Better Ingestion

Quick mention of the Content Understanding skill that also shipped at Ignite 2025. If you’re ingesting complex documents (PDFs with cross-page tables, PowerPoints, embedded images), this is a big upgrade over the Document Layout skill. Cross-page tables come out as a single unit, output is Markdown (which LLMs handle far better), and chunks can span page boundaries. Enable it via contentExtractionMode in ingestionParameters. For the deep dive on Content Understanding itself, check the previous post in this series.

So What Does This Cost?

Agentic anything sounds expensive, so let me address this head-on. There are two billing dimensions:

Azure OpenAI tokens for query planning (and answer synthesis if enabled). Pay-as-you-go based on the model you assign. Using gpt-4o-mini keeps costs low.
Azure AI Search agentic reasoning tokens: 50 million free tokens per month on Free tier, then pay-as-you-go on Standard.

Microsoft’s cost estimation example works out to roughly $4.32 USD for 2,000 agentic retrievals with three sub-queries each. Pretty reasonable. You can optimise further by lowering the reasoning effort (minimal skips the LLM entirely) and consolidating knowledge sources.

Note: Semantic ranker is a hard dependency. Disable it and agentic retrieval stops working. Make sure your pricing tier supports it.

Wrapping Up

Agentic retrieval is, in my opinion, the most significant capability Azure AI Search has shipped since vector search went GA. It moves intelligence into the retrieval layer where it belongs. Foundry IQ makes it trivially easy to wire into your agents, and document-level access control (while still maturing) is heading in exactly the right direction for enterprise adoption.

If you’re building RAG on Azure today, spin up a knowledge base on the free tier and throw your most complex user queries at it. The difference is noticeable.

Until next time, stay cloudy!