Back to Insights

The RAG Trap: Why Your Vector Database Is a Security Liability

Most RAG implementations flatten access controls. If a user asks "What are the CEO's bonuses?", the vector database retrieves it because it's relevant—ignoring the fact that the user shouldn't see it.

Retrieval-Augmented Generation (RAG) is the standard pattern for enterprise AI. You take your documents, chunk them, embed them, and store them in a vector database. When a user asks a question, you find the most similar chunks and feed them to the LLM. It’s elegant, efficient, and—in 90% of deployments—completely insecure.

The problem isn't the AI. It's the retrieval.

In your file system (SharePoint, Google Drive, Box), documents have complex Access Control Lists (ACLs). Only HR sees salary data. Only Legal sees active litigation. Only Execs see M&A targets.

But when you scrape those documents into a vector store, you often strip those permissions away. You create a flat index where "relevance" is the only metric that matters. If a junior engineer asks a question that is semantically close to a confidential strategy document, the vector database happily retrieves it.

The Silent Leak: Even if the LLM is instructed not to reveal secrets, the secret data has already been retrieved and injected into the context window. It exists in the logs, in the cache, and potentially in the model's generated reasoning—even if the final output is sanitized.

The ACL Flattening Problem

Why does this happen? Because vector databases are search engines, not permission engines.

In a traditional enterprise search setup, the indexer mirrors the ACLs of the source document. When a user searches, the system checks their credentials against the index before returning results.

In the rush to deploy GenAI, many organizations skipped this step. They built "knowledge bases" that aggregate data from multiple silos into a single Pinecone or Milvus index, often using a single service account to read the source data.

The "CEO Bonus" Scenario

User: "How are performance bonuses calculated this year?"

Vector DB: Scans for "bonus calculation." Finds the generic HR policy (public) and the Executive Compensation Committee minutes (restricted).

Result: Both documents are retrieved because they are legally "relevant" to the query. The LLM receives both. If the prompt isn't perfect, the LLM summarizes the executive metrics for the junior employee.

The "Silent Retrieval" Risk

Security teams often try to patch this by adding a system prompt: "Do not reveal confidential information."

This is security theater. By the time the model sees that instruction, the confidential data is already in the context window.

The Solution: Late-Binding Permissions

You cannot rely on the LLM to enforce security. Security must happen at retrieval time.

The Sovereign Institute advocates for a Late-Binding Permission Architecture.

1. Document-Level Tagging

When ingestion happens, you don't just store the vector. You store the metadata of the source ACLs.

2. Query-Time Filtering

When a user submits a query, the system identifies the user and resolves their active directory groups.

The vector search includes a pre-filter: "Find vectors near this query, BUT ONLY WHERE metadata.groups overlaps with user.groups."

3. The "Empty Result" Principle

If the only relevant documents are ones the user can't see, the system must return nothing. It should not return a message saying "I found relevant documents but you can't see them" (which leaks existence). It should behave as if the information does not exist.

Strategy Security Level Complexity Latency Impact
Separate Indexes High High (Manage 50+ indexes) Low
Post-Retrieval Filter Medium Low High (Fetch 100, filter to 5)
Pre-Computation (ACLs in Metadata) High Medium Low (Native vector DB filtering)

Sovereign Architecture Advantage

This is where sovereign architecture shines. Implementing granular, user-level ACL filtering is incredibly difficult when using a third-party RAG-as-a-Service platform. They rarely have deep visibility into your Active Directory.

When you own the stack—the ingestion pipeline, the vector store, and the retrieval logic—you can enforce "permissions first, relevance second."

Is your RAG leaking?

The TSI Framework includes the "Secure Retrieval Pattern" for implementing AD-aware vector search.

Explore the Framework
← Previous Air-Gap Realities: Deploying LLMs Where The Internet Does Not Exist Next → The Shift: Why Sovereign AI Is the Only Enterprise AI