A financial services firm deployed a RAG system for internal policy lookup. Initial testing showed 94% retrieval accuracy. Six months later, support tickets about "wrong answers" had tripled. When they finally investigated, retrieval accuracy had dropped to 57%—and the degradation had been gradual enough that no single day triggered an alert.
The culprit wasn't the language model. It wasn't the prompts. It was embedding drift—a systematic divergence between how queries and documents are represented in vector space that accumulates invisibly until the system fails.
The hidden assumption: RAG systems assume that queries and documents embedded at different times will land in comparable regions of vector space. This assumption breaks in multiple ways, and most teams don't monitor for it.
What Is Embedding Drift?
Embedding models convert text into high-dimensional vectors. Similar meanings should produce similar vectors, enabling semantic search. But "similar" is relative to how the embedding model was trained—and that relationship isn't stable over time in production systems.
Drift occurs when the relationship between query embeddings and document embeddings changes, causing retrieval to return less relevant results even though neither the queries nor the documents have changed.
The Three Types of Drift
| Drift Type | Cause | Detection Difficulty |
|---|---|---|
| Model drift | Embedding model updated by provider | Easy (if you track versions) |
| Corpus drift | New documents change vector space distribution | Medium (requires distribution monitoring) |
| Query drift | User query patterns evolve over time | Hard (requires query analysis) |
Model Drift: The Obvious One
When your embedding provider updates their model, vectors generated before and after the update won't be compatible. A document embedded with v1 and a query embedded with v2 may not match correctly, even for identical text.
Real Example
OpenAI updated text-embedding-ada-002 in December 2023. Organizations that had embedded millions of documents with the previous version suddenly had misaligned vector spaces. Queries embedded with the new model retrieved different—often worse—results from the existing document index.
Model drift is the most obvious form because it happens discretely—there's a clear before and after. But it's also the most dangerous for cloud API users because the provider controls when it happens.
Corpus Drift: The Subtle One
Even with a stable embedding model, adding documents changes the vector space. New documents shift the distribution of vectors, affecting which existing documents get retrieved for any given query.
How It Works
Vector similarity is relative, not absolute. When you search for the "top 5 most similar" documents, you're comparing against all documents in the index. Adding new documents changes what "most similar" means.
The Dilution Effect
Your index starts with 10,000 carefully curated policy documents. Over six months, 50,000 meeting notes, emails, and drafts are added. Now when users search for policy information, meeting notes that happen to contain policy keywords often outrank the actual policies—because there are simply more of them in relevant regions of vector space.
Cluster Collapse
A more severe form: as similar documents accumulate, they form dense clusters in vector space. Queries near these clusters consistently retrieve documents from the cluster, even when more relevant documents exist elsewhere. The dense cluster "captures" queries that should go elsewhere.
Query Drift: The Invisible One
User behavior evolves. The queries your system receives in month 12 aren't the same as month 1—different terminology, different topics, different phrasing. If your document embeddings don't evolve correspondingly, retrieval degrades.
Terminology Evolution
Organizations adopt new terms, products get renamed, industry jargon shifts. Documents embedded with old terminology don't match queries using new terminology, even when they're semantically about the same thing.
Example: Product Rename
A company renames "Project Atlas" to "Horizon Platform." All documentation still references "Atlas." Users searching for "Horizon" get poor results because the embeddings don't know these terms are equivalent. The documents are correct; the vector space doesn't reflect current usage.
Detecting Drift
You can't fix what you can't see. Drift detection requires instrumentation that most RAG deployments lack.
Retrieval Quality Metrics
- Hit rate: Percentage of queries where relevant documents appear in top-k results
- Mean reciprocal rank: How high relevant documents rank on average
- Similarity score distribution: Are top results getting less similar over time?
- User feedback correlation: Do thumbs-down responses correlate with low similarity scores?
The baseline problem: You need ground truth to measure retrieval quality. Without labeled query-document pairs, you're measuring proxies. Build evaluation sets early and maintain them.
Distribution Monitoring
Track statistical properties of your vector space over time:
- Centroid shift: Is the average document vector moving?
- Variance changes: Are vectors becoming more or less spread out?
- Cluster analysis: Are new clusters forming? Are existing clusters growing disproportionately?
- Query-corpus alignment: Do query vectors land in regions with documents?
Model Version Tracking
For every document in your index, store:
- Embedding model identifier and version
- Timestamp of embedding generation
- Hash of source content (to detect if re-embedding is needed)
Preventing and Mitigating Drift
Strategy 1: Version Pinning
Lock your embedding model version and don't update without re-embedding your entire corpus. This prevents model drift but requires re-embedding capability.
- Sovereign advantage: You control when models update
- API risk: Providers may deprecate versions, forcing migration
- Cost consideration: Re-embedding large corpora is expensive with pay-per-token APIs
Strategy 2: Continuous Re-embedding
Periodically re-embed your entire corpus with the current model version. Treats the vector index as ephemeral, regenerated on schedule.
| Corpus Size | Re-embedding Frequency | Typical Cost (API) | Typical Cost (Sovereign) |
|---|---|---|---|
| 100K documents | Monthly | $500/month | ~$20/month (compute) |
| 1M documents | Quarterly | $5,000/quarter | ~$200/quarter |
| 10M documents | Semi-annually | $50,000/cycle | ~$2,000/cycle |
Strategy 3: Query-Time Alignment
Instead of re-embedding documents, adjust queries to align with the document embedding space. Techniques include:
- Query expansion: Add synonyms and related terms to queries
- Learned query transformation: Train a model to transform queries for better retrieval
- Hybrid search: Combine vector search with keyword search to catch drift-related misses
Strategy 4: Hierarchical Indexing
Segment your corpus into separate indices by time period, document type, or topic. Query multiple indices and merge results. This limits the impact of drift within any single index.
Time-Windowed Indices
Index A: Documents from 2022 (stable, rarely queried)
Index B: Documents from 2023 (stable)
Index C: Documents from 2024 (active, frequently updated)
Re-embed only the active index regularly. Historical indices remain stable. Queries search all indices and merge results.
The Hybrid Search Solution
Pure vector search is most vulnerable to drift. Hybrid approaches that combine vector similarity with keyword matching are more resilient.
Why Hybrid Works
- Keyword matching is stable: BM25 and similar algorithms don't drift—"policy" always matches "policy"
- Complementary failures: When vector search fails due to drift, keyword search often succeeds (and vice versa)
- Tunable balance: Adjust vector vs. keyword weighting based on observed retrieval quality
Implementation Pattern
- Execute vector search, retrieve top 20 candidates with similarity scores
- Execute keyword search on same query, retrieve top 20 candidates with BM25 scores
- Normalize scores to comparable ranges
- Combine with weighted formula:
final_score = α × vector_score + (1-α) × keyword_score - Re-rank combined results, return top k
Start with α = 0.7 (favor vectors), monitor retrieval quality, adjust based on observed drift.
Sovereign Architecture Advantages
Why Sovereign Deployment Prevents Drift
Version Control
You decide when to update embedding models. No surprise provider updates breaking your index.
Affordable Re-embedding
Re-embed your entire corpus for compute cost only. No per-token fees making maintenance prohibitive.
Custom Monitoring
Instrument your pipeline however needed. Track distributions, detect drift, alert on degradation.
Fine-tuned Embeddings
Adapt embedding models to your domain vocabulary, reducing terminology-related drift.
Operational Checklist
For production RAG systems, implement these drift countermeasures:
- Track model versions: Log which model version created each embedding
- Build evaluation sets: Maintain labeled query-document pairs for quality measurement
- Monitor retrieval metrics: Track hit rate, MRR, and similarity distributions weekly
- Implement hybrid search: Don't rely on vectors alone
- Plan re-embedding: Budget for periodic full re-embedding, know the cost and timeline
- Segment indices: Isolate stable content from frequently-updated content
- Alert on degradation: Set thresholds for retrieval quality; investigate drops promptly
Building RAG systems that need to last?
The TSI Framework includes monitoring patterns and re-embedding strategies for production retrieval systems.
Explore the Framework