Building a RAG System with Microsoft.Extensions.VectorData

April 22, 2026

dotnet ai architecture

This post is based on my talk at Dotnet Georgia.

I wanted to turn that talk into a written version focused on the practical side: what RAG is, why it matters, and how to approach it in .NET with Microsoft.Extensions.VectorData.

Even the smartest AI models do not know who Lacrimosa is. He is my cat. But once I give the model access to my own data through a RAG pipeline, it can answer that question correctly. RAGrimosa is a small .NET sample that shows how to make that work in practice.

LLMs Are Useful, but Not Reliable by Default

LLMs changed how we build software. They are good at working with language, summarizing information, generating code, and turning rough input into something structured.

That part is real. The part people often skip is the limitation.

An LLM is not a live system of record. It is a model trained on past data. That means it can sound confident while being outdated, incomplete, or simply wrong.

In practice, the usual failure modes look like this:

the answer is based on stale information
the model invents details that sound plausible
the reasoning looks clean, but the result is still wrong

This is where patterns like RAG become useful. Instead of asking the model to answer from training data alone, we give it relevant context at request time.

Before getting into RAG, it helps to cover the building blocks behind it.

Vectors

A vector is just a list of numbers. In AI systems, those numbers represent some features of the original data. For text, that usually means semantic meaning rather than exact wording.

The important idea is distance. If two vectors are close to each other, the source data is usually similar as well.

Embeddings

Embeddings are vectors created from real-world data such as text, code, images, or audio. They capture meaning in a way that makes similarity search possible.

For example, two sentences can use different words and still end up with similar embeddings if they mean roughly the same thing.

Vector stores

Vector stores are databases built for this kind of search. They let you store embeddings and retrieve the nearest matches quickly.

That matters because RAG depends on finding relevant context fast enough to use it during a request. Instead of matching exact words, you search by meaning.

RAG

RAG stands for Retrieval-Augmented Generation.

The idea is simple: before the model answers, the system retrieves relevant information from an external source such as documents, database records, or internal knowledge bases. That retrieved context is then included in the prompt.

A typical flow looks like this:

a user asks a question
the system searches for relevant content
the relevant content is attached to the model input
the model answers using both the prompt and the retrieved context

RAG does not make the model smarter in a general sense. It makes the answer more grounded in data you control.

RAG Diagram

RAG matters because it solves a very practical problem: most real systems need answers based on current, domain-specific data.

It also gives you a cleaner operating model:

you can update the knowledge source without retraining the model
you can connect private documents, APIs, or internal data
you can keep the model layer and the data layer loosely coupled

That does not remove all failure modes, but it is usually a much better foundation than hoping the model “just knows” your business context.

Microsoft.Extensions.VectorData

Microsoft.Extensions.VectorData is a .NET library that gives you a consistent abstraction over vector stores.

What I like about it is that it feels familiar from a .NET developer’s perspective. You work with collections, records, and attributes instead of wiring every provider differently. That makes it easier to experiment early and keep the codebase cleaner when the solution grows.

Core components

The main pieces are straightforward.

VectorStore is the entry point. It is responsible for working with collections and gives you a common place to manage vector-backed data.

VectorStoreCollection<TKey, TRecord> represents a concrete collection of records. In most cases, a record includes an ID, some metadata, and one or more vector fields.

The model is shaped with attributes:

VectorStoreKey marks the primary identifier
VectorStoreData marks regular data fields
VectorStoreVector marks the embedding field used for similarity search

This is not magic, and that is a good thing. The library stays close to the underlying concepts while removing a lot of repetitive plumbing.

If you are building RAG in .NET, that is a good tradeoff.

Running the sample locally

I put together a small reference project for this post: RAGrimosa.

It is intentionally simple:

a .NET 10 console app
Postgres with pgvector
local text-file ingestion
OpenAI for embeddings and chat

The fastest way to run it is with Docker Compose.

Option 1: run everything with Docker

Prerequisites:

Docker Desktop or Docker Engine with Compose
an OpenAI API key

Then:

git clone https://github.com/gabisonia/RAGrimosa.git
cd RAGrimosa

Open RAGrimosa/appsettings.json and set your OpenAI API key in the OpenAI section.

Then start the app:

docker compose run --rm --build app

This starts Postgres, enables pgvector, runs ingestion, and then drops you into an interactive prompt:

user >

At that point you can start asking questions about the ingested file.

Option 2: run Postgres in Docker, but run the app with `dotnet`

If you want the database in a container but the app on your machine, use this flow instead.

Prerequisites:

Docker Compose
.NET 10 SDK
an OpenAI API key

First start only the database:

docker compose up -d db

Then update RAGrimosa/appsettings.json so the connection string uses Host=localhost instead of Host=db.

After that, run the console app:

dotnet run --project RAGrimosa/RAGrimosa.csproj

The default setup is small on purpose. That makes it easy to see the RAG pipeline end to end without too much framework noise.

What the code looks like

The full repo is here:

1. Define the record that goes into the vector store

The DocumentChunk model is where Microsoft.Extensions.VectorData starts to feel useful. The record is plain C#, and the attributes make the intent obvious:

internal sealed class DocumentChunk
{
    [VectorStoreKey]
    public required string Id { get; init; }

    [VectorStoreData]
    public required string Content { get; init; }

    [VectorStoreData]
    public required string Source { get; init; }

    [VectorStoreData]
    public int ChunkIndex { get; init; }

    [VectorStoreVector(Dimensions: 1536, DistanceFunction = DistanceFunction.CosineSimilarity)]
    public string Embedding => Content;
}

That is one of the things I like about this API. The data model stays readable. You do not need a huge amount of provider-specific setup just to describe what should be indexed.

2. Register the chat client, embedding client, and vector collection

The project uses a standard host builder and wires everything in one place:

builder.Services.AddEmbeddingGenerator(sp =>
{
    var options = sp.GetRequiredService<IOptions<OpenAiOptions>>().Value;
    return new EmbeddingClient(options.EmbeddingModel, options.ApiKey).AsIEmbeddingGenerator();
});

builder.Services.AddChatClient(sp =>
{
    var options = sp.GetRequiredService<IOptions<OpenAiOptions>>().Value;
    return new ChatClient(options.ChatModel, options.ApiKey).AsIChatClient();
});

builder.Services.AddPostgresCollection<string, DocumentChunk>(
    postgresConfiguration.CollectionName,
    postgresConfiguration.ConnectionString);

That is the core of the setup. One client for embeddings, one for chat, and one typed collection backed by Postgres.

3. Ingest the source document as chunks

The ingestion service reads the local file, splits it into overlapping chunks, and upserts them into the collection:

var fileContent = await File.ReadAllTextAsync(ingestionOptions.InputFilePath, cancellationToken);
var chunks = SplitIntoChunks(fileContent, ingestionOptions.ChunkSize, ingestionOptions.ChunkOverlap);

var records = new List<DocumentChunk>(chunks.Count);
for (var index = 0; index < chunks.Count; index++)
{
    records.Add(new DocumentChunk
    {
        Id = CreateStableChunkId(sourceName, index),
        Content = chunks[index],
        Source = sourceName,
        ChunkIndex = index,
    });
}

await collection.UpsertAsync(records, cancellationToken: cancellationToken);

This is the part many RAG demos skip over too quickly. Chunking strategy matters. File boundaries matter. Stable IDs matter if you want re-ingestion to behave predictably.

4. Retrieve context and build the grounded prompt

Once ingestion is done, the orchestrator searches the collection and sends the retrieved snippets to the chat model:

await foreach (var result in collection.SearchAsync(query, top, cancellationToken: cancellationToken))
{
    results.Add(result);
}

var chatMessages = new[]
{
    new ChatMessage(ChatRole.System, ragSettings.SystemPrompt),
    new ChatMessage(ChatRole.User, $"{BuildContextSection(searchResults)}Question:\n{question}"),
};

var response = await chatClient.GetResponseAsync(chatMessages, cancellationToken: cancellationToken);

This is the actual RAG loop in a small amount of code:

search for relevant chunks
build a grounded prompt
send both the question and the retrieved context to the model

That is the part I wanted the sample to make obvious.

What I like about RAG is that it solves a very normal engineering problem. The model does not know your private context, your documents, or facts like who Lacrimosa is. You have to give it that context in a way that is structured and repeatable.

That is why I like this setup. A small console app, Postgres with pgvector, and Microsoft.Extensions.VectorData are enough to show the idea clearly. No big abstractions, no unnecessary layers, just the core flow from ingestion to retrieval to answer.

Source Code