Azure Content Understanding GA: One Service to Replace Them All

Recently I was putting together a demo for an intelligent document processing pipeline. The kind where invoices, contracts, and identity documents pour in from a dozen different sources, and you need structured data out the other end. Pretty standard stuff in 2025. The catch? I had Document Intelligence for the PDFs, Azure AI Vision for scanned images, Speech for audio transcripts, and a tangle of custom glue code stitching it all together. Three SDKs, three billing models, three sets of quotas, and one increasingly frustrated bloke (me) trying to keep it all in sync.

Then Microsoft dropped Azure Content Understanding at Ignite 2025, and I immediately had that “wait, this replaces all of it?” moment. Spoiler: yes, it does. One API, one analyser model, documents, images, audio, and video. Let’s dive in!

In English Please?

Content Understanding is a Foundry Tool that uses generative AI to process unstructured content of any modality (documents, images, audio, video) and transform it into structured, schema-defined output. Think of it as the Swiss Army knife that replaces your drawer full of specialised tools.

The core abstraction is the analyser. An analyser is a reusable configuration that defines how your content gets processed: what content extraction to perform (OCR, layout, transcription), what fields to extract, and which generative model to use. You create an analyser once, then throw files at it. The service handles the rest.

There are three extraction methods you can use on any field:

  • Extract: pull values directly as they appear in the document (dates, amounts, names)
  • Generate: use the LLM to synthesise or summarise information (document summaries, scene descriptions)
  • Classify: categorise content against a predefined set of options (document type, sentiment, chart type)

The killer feature is that you can mix all three methods in a single analyser. Extract an invoice number, generate a summary, and classify the document type, all in one API call. And every extracted field comes with confidence scores and source grounding that traces back to exactly where in the document each value was found. That’s the kind of traceability that makes compliance teams very happy.

What Shipped at GA?

The November 2025 GA release (API version 2025-11-01) is substantial. Here’s what landed:

Bring Your Own Foundry Model: This was the single biggest request during preview. You can now connect Content Understanding to your own Foundry model deployment, choosing GPT-4.1, GPT-4.1-mini, or whichever model suits your quality and cost requirements. Pay-as-you-go or Provisioned Throughput, your call.

RAG-optimised analysers: Four new prebuilt analysers designed specifically for search and retrieval scenarios:

  • prebuilt-documentSearch: extracts paragraphs, tables, figures (with descriptions), handwritten annotations, and generates document summaries. This one converts charts into chart.js syntax and diagrams into mermaid.js, which is frankly brilliant for making visual content searchable.
  • prebuilt-videoSearch: transcript extraction with automatic scene detection and per-segment summaries
  • prebuilt-audioSearch: conversation transcription with speaker diarisation and multilingual support
  • prebuilt-imageSearch: visual content descriptions and insights

Domain-specific prebuilt analysers: A huge catalogue of industry-specific analysers covering finance and tax (invoices, receipts, W-2s, the full 1099 family, 1098 series), mortgage and lending (Form 1003, 1004, 1005, closing disclosures), identity verification (passports, driver’s licences, ID cards worldwide), procurement and contracts, and utilities billing. Over 70 prebuilt analysers out of the box.

Classification baked into the analyser: No more separate classifier API. You can define up to 200 content categories within a single analyser using the contentCategories property. The service will automatically segment a multi-document file, classify each piece, and route it to the appropriate downstream analyser. In my opinion, this is one of the most practical features for real-world document processing where you receive a mixed bag of scanned documents in a single PDF.

Content extraction upgrades: Multi-page tables extracted as a single logical unit (finally), hyperlink extraction, barcode detection, figure extraction as chart.js or mermaid.js, and annotation detection for highlights, underlines, and strikethroughs in digital PDFs.

Enterprise security: Entra ID, managed identities, customer-managed keys, VNets, and private endpoints. Available across 14 regions worldwide at GA.

As the official announcement on the Foundry Blog puts it, Content Understanding is now the recommended starting point for all new file-processing workloads. Document Intelligence isn’t disappearing overnight, but the direction of travel is clear.

Show Me the Code

Let’s walk through a practical example. Say you want to extract structured fields from invoices. First, install the GA .NET SDK (shipped March 2026):

dotnet add package Azure.AI.ContentUnderstanding

Now set up the client:

using Azure;
using Azure.AI.ContentUnderstanding;

string endpoint = Environment.GetEnvironmentVariable("CONTENTUNDERSTANDING_ENDPOINT")!;
string key = Environment.GetEnvironmentVariable("CONTENTUNDERSTANDING_KEY")!;

ContentUnderstandingClient client = new(
    new Uri(endpoint),
    new AzureKeyCredential(key));

The simplest path is using a prebuilt analyser. Here’s how to process an invoice with the prebuilt-invoice analyser:

using Azure.AI.ContentUnderstanding.Models;

string invoiceUrl =
    "https://raw.githubusercontent.com/" +
    "Azure-Samples/" +
    "azure-ai-content-understanding-assets/" +
    "main/document/invoice.pdf";

var operation = await client.AnalyzeAsync(
    WaitUntil.Completed,
    analyzerId: "prebuilt-invoice",
    inputs: [new AnalysisInput(new Uri(invoiceUrl))]);

AnalysisResult result = operation.Value;

// Get the document content
DocumentContent documentContent = result.Contents[0];

// Extract key fields with confidence scores
if (documentContent.Fields.TryGetValue("CustomerName", out ContentField? customerName))
{
    Console.WriteLine($"Customer Name: {customerName.Value}");
    if (customerName.Confidence.HasValue)
    {
        Console.WriteLine($"  Confidence: {customerName.Confidence.Value:F2}");
    }
}

if (documentContent.Fields.TryGetValue("InvoiceDate", out ContentField? invoiceDate))
{
    Console.WriteLine($"Invoice Date: {invoiceDate.Value}");
}

// Extract line items (array of objects)
if (documentContent.Fields.TryGetValue("LineItems", out ContentField? lineItems)
    && lineItems is ArrayField arrayField && arrayField.Value is not null)
{
    Console.WriteLine($"\nLine Items ({arrayField.Value.Count}):");
    int i = 1;
    foreach (ContentField item in arrayField.Value)
    {
        if (item is ObjectField objectField && objectField.Value is not null)
        {
            objectField.Value.TryGetValue("Description", out ContentField? desc);
            objectField.Value.TryGetValue("Quantity", out ContentField? qty);
            Console.WriteLine($"  Item {i}: {desc?.Value ?? "N/A"}");
            Console.WriteLine($"    Quantity: {qty?.Value ?? "N/A"}");
        }
        i++;
    }
}

Pretty straight forward, right? One API call, structured output with confidence scores, no prompt engineering required.

But the real power is when you build a custom analyser with your own schema. Here’s an example that extracts company information, generates a summary, and classifies the document type, all in one shot:

using Azure.AI.ContentUnderstanding.Models;

string analyzerId = "my-invoice-analyzer";

var fieldSchema = new ContentFieldSchema("company_schema")
{
    Description = "Schema for extracting company information",
    Fields =
    {
        ["company_name"] = new ContentFieldDefinition(ContentFieldType.String)
        {
            Method = GenerationMethod.Extract,
            Description = "Name of the company",
            EstimateSourceAndConfidence = true
        },
        ["total_amount"] = new ContentFieldDefinition(ContentFieldType.Number)
        {
            Method = GenerationMethod.Extract,
            Description = "Total amount on the document",
            EstimateSourceAndConfidence = true
        },
        ["document_summary"] = new ContentFieldDefinition(ContentFieldType.String)
        {
            Method = GenerationMethod.Generate,
            Description = "A brief summary of the document content"
        },
        ["document_type"] = new ContentFieldDefinition(ContentFieldType.String)
        {
            Method = GenerationMethod.Classify,
            Description = "Type of document",
            Enum = { "invoice", "receipt", "contract", "report", "other" }
        }
    }
};

var config = new ContentAnalyzerConfig()
{
    EnableLayout = true,
    EnableOcr = true,
    EstimateFieldSourceAndConfidence = true,
    ReturnDetails = true
};

var analyzer = new ContentAnalyzer("prebuilt-document")
{
    Description = "Custom analyser for company document processing",
    Config = config,
    FieldSchema = fieldSchema,
    Models =
    {
        ["completion"] = "gpt-4.1",
        ["embedding"] = "text-embedding-3-large"
    }
};

// Create the analyser (one-time setup)
var operation = await client.CreateAnalyzerAsync(
    WaitUntil.Completed,
    analyzerId: analyzerId,
    resource: analyzer);

Console.WriteLine($"Analyser '{analyzerId}' created successfully!");

Note: The Models dictionary is where you specify your Foundry model deployments. Content Understanding uses your deployed models for the generative features (field extraction, figure analysis), so you’re in full control of cost and quality. The service itself only charges for content extraction and contextualisation; the LLM token costs come from your own deployment.

Slotting into AI Search (and the Bigger Picture)

If you’re building RAG applications, this is where Content Understanding really shines. There’s a dedicated Content Understanding skill for Azure AI Search that replaces the older Document Layout skill with some significant advantages:

  • Tables and figures output as Markdown (not plain text), so LLMs can actually reason over them
  • Multi-page tables extracted as a single unit rather than page by page
  • Chunks can span multiple pages, so semantic units like cross-page tables aren’t artificially split
  • More cost effective than the Document Layout skill

The architecture pattern is elegant: Content Understanding processes your documents into clean, structured Markdown with extracted fields. AI Search indexes it for retrieval. Foundry Agents query the index to answer user questions. Three services, one coherent pipeline, and no custom glue code required.

Be warned: the Content Understanding skill doesn’t include the 20-free-documents-per-indexer-per-day allowance that other AI skills get. Every document processed is billed at the Content Understanding price. Something to factor into your cost estimates, which, as regular readers will know, is something I always keep an eye on.

What About Document Intelligence?

This is the question everyone’s asking. Document Intelligence isn’t being switched off tomorrow. The prebuilt-read and prebuilt-layout analysers in Content Understanding bring the key Document Intelligence capabilities forward, so you have a migration path. In my opinion, if you’re starting a new project, go straight to Content Understanding. If you have an existing Document Intelligence deployment that’s working well, there’s no urgent need to rip and replace, but you should start planning the migration.

The SDKs went GA in March 2026 for .NET, Python, Java, and JavaScript/TypeScript, all targeting the 2025-11-01 API version. That’s your signal that this is production-ready and Microsoft’s go-forward investment.

Wrapping Up

Content Understanding is one of those services that makes you wonder why it took so long. The unified multimodal API, the analyser abstraction, the BYO model flexibility, and the deep integration with AI Search and Foundry Agents, it all just fits together. If you’ve been juggling Document Intelligence, Vision, and Speech in a single pipeline, this is your consolidation play.

As always, feel free to reach out with any questions or comments!

Until next time, stay cloudy!

Leave a Comment