Retrieval-Augmented Generation (RAG) is the standard pattern for enterprise AI. You take your documents, chunk them, embed them, and store them in a vector database. When a user asks a question, you find the most similar chunks and feed them to the LLM. It’s elegant, efficient, and—in 90% of deployments—completely insecure.
The problem isn't the AI. It's the retrieval.
In your file system (SharePoint, Google Drive, Box), documents have complex Access Control Lists (ACLs). Only HR sees salary data. Only Legal sees active litigation. Only Execs see M&A targets.
But when you scrape those documents into a vector store, you often strip those permissions away. You create a flat index where "relevance" is the only metric that matters. If a junior engineer asks a question that is semantically close to a confidential strategy document, the vector database happily retrieves it.
The Silent Leak: Even if the LLM is instructed not to reveal secrets, the secret data has already been retrieved and injected into the context window. It exists in the logs, in the cache, and potentially in the model's generated reasoning—even if the final output is sanitized.
The ACL Flattening Problem
Why does this happen? Because vector databases are search engines, not permission engines.
In a traditional enterprise search setup, the indexer mirrors the ACLs of the source document. When a user searches, the system checks their credentials against the index before returning results.
In the rush to deploy GenAI, many organizations skipped this step. They built "knowledge bases" that aggregate data from multiple silos into a single Pinecone or Milvus index, often using a single service account to read the source data.
The "CEO Bonus" Scenario
User: "How are performance bonuses calculated this year?"
Vector DB: Scans for "bonus calculation." Finds the generic HR policy (public) and the Executive Compensation Committee minutes (restricted).
Result: Both documents are retrieved because they are legally "relevant" to the query. The LLM receives both. If the prompt isn't perfect, the LLM summarizes the executive metrics for the junior employee.
The "Silent Retrieval" Risk
Security teams often try to patch this by adding a system prompt: "Do not reveal confidential information."
This is security theater. By the time the model sees that instruction, the confidential data is already in the context window.
- Log Exposure: Most LLM gateways log the full prompt (including retrieved context) for debugging. Your logs now contain the secrets.
- Provider Exposure: If you use a cloud API, you just sent that secret document to OpenAI or Anthropic, regardless of whether the model outputs it.
- Prompt Injection: A clever user can override the "do not reveal" instruction if the data is present in the context.
The Solution: Late-Binding Permissions
You cannot rely on the LLM to enforce security. Security must happen at retrieval time.
The Sovereign Institute advocates for a Late-Binding Permission Architecture.
1. Document-Level Tagging
When ingestion happens, you don't just store the vector. You store the metadata of the source ACLs.
- Vector:
[0.12, -0.45, ...] - Metadata:
groups: ["hr-admin", "exec-team"]
2. Query-Time Filtering
When a user submits a query, the system identifies the user and resolves their active directory groups.
The vector search includes a pre-filter: "Find vectors near this query, BUT ONLY WHERE metadata.groups overlaps with user.groups."
3. The "Empty Result" Principle
If the only relevant documents are ones the user can't see, the system must return nothing. It should not return a message saying "I found relevant documents but you can't see them" (which leaks existence). It should behave as if the information does not exist.
| Strategy | Security Level | Complexity | Latency Impact |
|---|---|---|---|
| Separate Indexes | High | High (Manage 50+ indexes) | Low |
| Post-Retrieval Filter | Medium | Low | High (Fetch 100, filter to 5) |
| Pre-Computation (ACLs in Metadata) | High | Medium | Low (Native vector DB filtering) |
Sovereign Architecture Advantage
This is where sovereign architecture shines. Implementing granular, user-level ACL filtering is incredibly difficult when using a third-party RAG-as-a-Service platform. They rarely have deep visibility into your Active Directory.
When you own the stack—the ingestion pipeline, the vector store, and the retrieval logic—you can enforce "permissions first, relevance second."
Is your RAG leaking?
The TSI Framework includes the "Secure Retrieval Pattern" for implementing AD-aware vector search.
Explore the Framework