Vector Databases Explained: History, Internals, and Why Agentic AI Depends on Them

A lot of the recent attention on vector databases makes them sound like a brand-new invention created by the generative AI boom. That is not really true.

What changed is not the underlying math. What changed is the workload.

For more than a decade, industry and academia had already been working on large-scale nearest-neighbor search for recommendation systems, image retrieval, search, ads, and ranking. The generative AI wave did something different: it turned vector retrieval from a specialized backend capability into a mainstream application primitive. Once teams started building retrieval-augmented generation (RAG), long-term AI memory, semantic search, and tool-using agents, vector databases stopped being niche infrastructure and became part of the standard stack.

This post is a deep technical guide to that shift:

what a vector database actually is
how the field evolved before and after generative AI became popular
the key implementation details that matter in practice
how vector databases relate to embedding models
how they fit into Agentic AI systems for memory, knowledge, and context management
which open source and commercial systems matter today

1) What is a vector database?

A vector database is a storage and retrieval system optimized for high-dimensional embeddings and similarity search.

Instead of retrieving rows primarily by exact keys or boolean predicates, a vector database retrieves items by closeness in vector space. That closeness usually means one of these distance/similarity functions:

cosine similarity
dot product / inner product
Euclidean (L2) distance

In practice, a vector database stores something like this for each item:

an ID
a vector embedding
optional payload/metadata such as source, author, tenant, tags, time range, permissions
optional raw text or object reference

A query is often:

take a query string, image, or event
convert it into an embedding using a model
search for the nearest vectors
optionally apply metadata filters
return top-k candidates

So the short definition is:

A vector database is a database or retrieval engine that stores embeddings and supports fast approximate nearest-neighbor search, usually with metadata filtering, updates, and operational capabilities for production use.

That last clause matters. A vector index is not the same thing as a vector database.

Vector index = the search data structure, such as HNSW or IVF-PQ
Vector database = the operational system around the index: ingestion, persistence, filters, replication, access control, updates, observability, APIs, multi-tenancy, backups, and sometimes hybrid retrieval

If you want an intuition: a vector database is to embeddings what an inverted index is to keyword search. It is the infrastructure layer that makes semantic retrieval fast enough and manageable enough to use in real systems.

2) Why ordinary databases are not enough

A traditional relational database is excellent at questions like:

find customer 12345
return invoices created after May 1
join orders with order_items
filter users where country = “DE” and status = “active”

But semantic retrieval asks a different question:

find chunks that mean roughly the same thing as this query
find similar images
find related past incidents
find memory entries that are conceptually close even when wording differs

You can store embeddings in a normal database table. That does not automatically solve the hard part: efficiently searching millions or billions of high-dimensional vectors with low latency.

The brute-force approach is exact search:

compute distance from the query to every vector
sort results
return top-k

That is perfectly fine for small collections. It becomes expensive at production scale.

The entire vector database field exists because high-dimensional similarity search needs specialized indexing strategies, memory layouts, compression techniques, and query execution paths.

3) The core abstraction: embeddings + nearest neighbor search

Vector databases only make sense because of embedding models.

An embedding model maps data into a vector space such that semantically similar objects are placed near each other. The object could be:

text
code
image
audio
video
user behavior
graph nodes
products or documents

For text systems, the usual workflow is:

split documents into chunks
embed each chunk into a dense vector
store vectors plus metadata in the database
embed the user query
retrieve nearest chunks

This is dense retrieval.

The quality of the system depends on two linked pieces:

representation quality: does the embedding model preserve useful semantic relationships?
search quality: does the index retrieve the right neighbors quickly enough?

A great vector database cannot rescue a bad embedding model. A great embedding model can still underperform if the index is misconfigured, the similarity metric is wrong, or the chunking/filtering strategy is poor.

4) History of vector databases: before and after generative AI

The history makes more sense if we separate the field into two eras.

Era A: Before generative AI became mainstream

Before the ChatGPT wave, the core problem was usually framed as nearest neighbor search or approximate nearest neighbor (ANN) search rather than “vector databases.”

Early foundation: nearest neighbor search and the curse of dimensionality

Classical search structures such as k-d trees and ball trees work well at lower dimensions, but performance degrades badly as dimensions grow. This is part of the classic curse of dimensionality: distance concentration and search-space explosion make exact high-dimensional search expensive.

That led researchers toward approximate methods, accepting a trade-off:

slightly lower recall
dramatically better speed and scale

2000s to mid-2010s: ANN research matures

Several major families of ANN methods became foundational:

tree-based partitioning for lower-dimensional settings
hashing approaches such as locality-sensitive hashing (LSH)
quantization methods such as product quantization (PQ) for compressing vectors and accelerating search
graph-based methods for highly effective approximate navigation

One major milestone was Product Quantization for Nearest Neighbor Search by Jégou, Douze, and Schmid (2011), which showed how to compress vectors into compact codes while preserving enough distance structure for fast search at scale.

Another major milestone was HNSW (Hierarchical Navigable Small World graphs) by Malkov and Yashunin (2016), which became one of the most influential ANN methods thanks to its strong recall-latency trade-off and practical performance.

In this era, the problem was already industrially important. The main workloads were:

recommendation systems
image similarity and visual search
ads and ranking
music and item similarity
catalog matching
anomaly detection
search and retrieval in specialized domains

Libraries before databases

Before purpose-built vector databases became prominent, teams often used ANN libraries directly:

Annoy from Spotify for approximate nearest neighbors
FAISS from Meta for dense vector similarity search and clustering
NMSLIB / hnswlib for efficient graph-based ANN

FAISS in particular became a foundational building block. It was not a full database, but it offered a broad family of indexes, compression schemes, GPU support, and benchmarking utilities. Many later vector database products either wrapped FAISS-like concepts or competed by offering a more operationally complete system.

Late 2010s to early 2020s: from ANN engines to vector-native systems

As embeddings spread beyond recommender systems into semantic search and multimodal ML, systems started adding more database-like capabilities:

persistence
sharding and replication
metadata filtering
REST/gRPC APIs
incremental ingestion
distributed execution

This is the period when systems such as Milvus, Weaviate, and later Qdrant began to gain visibility as dedicated vector-native platforms.

Era B: After generative AI became popular

Generative AI did not invent vector search, but it created a new default pattern:

LLMs are powerful but have bounded context windows and imperfect factual memory
external retrieval can inject relevant context at generation time
embeddings + vector retrieval become the bridge between a model and external knowledge

The RAG turning point

The 2020 RAG paper by Lewis et al. helped define the now-standard architecture of retrieval-augmented generation: a parametric language model paired with a non-parametric memory accessed through dense retrieval.

That idea turned vector retrieval into a first-class component of modern AI applications.

Once teams started building:

company knowledge chatbots
code search assistants
support copilots
memory for agents
long-document QA
multimodal search

vector search moved from being a specialist tool to a default architectural choice.

Why the GenAI boom changed the market

After 2022, several things happened at once:

Embedding APIs became mainstream. It became easy for application teams to turn text into vectors.
RAG became a standard design pattern. Teams needed fast retrieval over their own corpora.
Agents needed memory. Not just one-shot retrieval, but recurrent storage and recall of facts, events, plans, and prior actions.
Operational demand exploded. People no longer wanted an ANN library; they wanted an API, durability, filters, access control, observability, backups, hybrid search, and easy cloud deployment.

This is when the phrase vector database became much more commercially visible.

The post-GenAI shift in expectations

Before GenAI, success was often measured mostly by ANN metrics:

recall@k
latency
memory footprint
throughput

After GenAI, product expectations expanded to include:

document ingestion pipelines
chunking and re-indexing workflows
filtering by tenant, time, or permissions
hybrid lexical + vector retrieval
support for sparse and dense vectors
integration with orchestration frameworks and agent runtimes
online updates and background compaction

In other words, the field moved from nearest-neighbor algorithms to retrieval systems for AI applications.

5) Key implementation details: how vector databases actually work

This is the part most high-level explainers skip.

A production vector database is usually a layered system:

ingest data and metadata
generate or accept embeddings
store vectors and payloads
build one or more ANN indexes
execute filtered similarity queries
return ranked candidates
keep the system up to date under inserts, deletes, re-indexing, and replication

Let’s unpack the main pieces.

5.1 Data model

At minimum, each record often looks like:

(id, vector, metadata, payload)

Example:

{
  "id": "doc_481_chunk_12",
  "vector": [0.013, -0.441, ...],
  "metadata": {
    "document_id": "doc_481",
    "source": "wiki",
    "tenant": "acme",
    "created_at": "2026-05-05",
    "access_level": "internal"
  },
  "payload": "A vector database stores embeddings and supports ANN search..."
}

Some systems store the raw payload internally. Others store only a pointer to object storage or another database.

5.2 Distance metrics

The similarity metric must match the embedding model and retrieval objective.

Common choices:

cosine similarity: angle-based similarity, common for text embeddings
inner product: often used when model training optimizes dot-product ranking
L2 distance: common in many ANN algorithms and image/feature workloads

A subtle but critical point:

If the embedding model was trained for cosine similarity, using L2 or raw dot product incorrectly can hurt retrieval quality.

In some systems, cosine similarity is implemented by normalizing vectors and then using dot product.

5.3 Exact search vs approximate search

Exact search computes the true nearest neighbors. It is simple and accurate, but expensive at large scale.

Approximate nearest neighbor (ANN) search sacrifices some recall to get dramatically lower latency and better memory/computation trade-offs.

This is the central engineering compromise in vector retrieval.

The practical question is rarely “exact or approximate?” The practical question is:

what recall can we afford?
at what latency?
at what memory cost?
under what update rate?

5.4 Major ANN index families

A. Flat / brute-force indexes

These store vectors directly and compare against all of them.

Pros:

exact results
simple implementation
easy updates

Cons:

expensive at scale

Useful when:

dataset is small
recall must be exact
reranking a small candidate set

B. IVF: inverted file indexes

IVF partitions vector space into coarse clusters. At query time, the system probes only the most relevant clusters instead of scanning everything.

Think of it as:

learn cluster centroids
assign each vector to a centroid/list
at query time, find the nearest centroids
search only within those lists

Pros:

scalable
tunable latency/recall via probe count
works well with quantization

Cons:

misses neighbors in unprobed clusters
quality depends on training and partitioning

C. PQ / OPQ: quantization-based indexes

Product Quantization (PQ) compresses vectors into short codes by splitting them into subspaces and quantizing each subspace independently.

This is powerful because memory bandwidth often becomes the bottleneck. Compressed codes let systems store far more vectors and compare them more efficiently.

Pros:

much lower memory footprint
enables billion-scale retrieval on limited hardware

Cons:

approximate distances introduce error
more training and tuning complexity

Many industrial systems combine IVF and PQ into IVF-PQ-style architectures.

D. Graph-based indexes: HNSW and relatives

HNSW builds a multi-layer proximity graph. Search starts at upper layers for coarse navigation and descends toward lower layers for fine-grained local search.

Why it became so popular:

excellent recall-latency trade-off
strong practical performance on many dense retrieval workloads
good out-of-the-box behavior

Key knobs:

M: graph connectivity / out-degree budget
efConstruction: search breadth during index build
efSearch: search breadth at query time

Bigger values usually mean:

better recall
more memory
slower builds or queries

E. Disk-aware indexes

In-memory indexes are fast, but expensive. Disk-aware systems such as DiskANN showed that well-designed graph search combined with SSD-aware layouts can deliver high recall and low latency on datasets too large for RAM.

This matters because many real corpora are large enough that “just keep everything in memory” becomes financially painful.

5.5 Metadata filtering is harder than it sounds

A lot of GenAI applications need queries like:

nearest chunks for tenant = acme
only documents after date X
only files user Y can access
only source = handbook and language = en

This is not just vector search anymore. It is vector search under constraints.

There are several ways systems handle this:

pre-filter then search: filter IDs first, then search within that subset
search then post-filter: retrieve candidates, then apply filters
filter-aware indexes: encode partitions or payload-aware search paths
hybrid execution: combine scalar filtering and ANN in one plan

This is one of the biggest gaps between a simple ANN library and a serious database system.

5.6 Sharding, partitioning, and replication

A production vector DB must scale across machines.

Typical strategies include:

sharding by tenant, collection, ID range, or hash
partitioning for locality and workload isolation
replication for availability and read scalability

The tricky part is that ANN indexes are not as easy to distribute as key-value lookups. Once data is split across shards, the system often must:

fan the query out to multiple shards
search locally on each shard
merge candidate lists globally
optionally rerank

This merge stage affects both latency and recall.

5.7 Updates, deletions, and rebuilds

Many explainers assume a static dataset. Production systems are not static.

Real questions include:

how fast can new vectors be inserted?
are deletes logical or physical?
when is background compaction required?
when must the index be rebuilt?
what happens when the embedding model changes?

This last point is especially important:

If you switch to a materially different embedding model, you usually need to re-embed the corpus and rebuild the index.

The vector space itself changed. Old and new embeddings may no longer be meaningfully comparable.

5.8 Hybrid retrieval

Pure vector retrieval is not always enough.

Keyword search still matters for:

exact names
error codes
identifiers
rare acronyms
legal clauses
code symbols

That is why many production systems use hybrid retrieval:

lexical retrieval (BM25 / inverted index)
vector retrieval (dense or sparse embeddings)
fusion or reranking

This is often better than vector-only retrieval for enterprise search and RAG.

5.9 Reranking

ANN search is often just the first stage.

A common stack is:

retrieve top 50 or top 200 candidates with a fast vector index
rerank with a stronger but slower model, often a cross-encoder
send the best few chunks to the LLM

This staged design is important because a vector DB optimizes candidate generation, not necessarily final relevance.

5.10 Hardware considerations

Vector search performance is often limited by:

memory bandwidth
cache locality
SIMD efficiency
GPU utilization
SSD access patterns

This is why implementations obsess over:

contiguous memory layout
vector normalization strategy
quantized representations
batched search
GPU kernels
asynchronous I/O for disk-backed indexes

The field is partly about algorithms and partly about systems engineering.

6) Relationship with embedding models

A vector database without embeddings is just empty infrastructure.

Embedding models define the geometry

The embedding model determines:

vector dimension
similarity metric
what semantic relationships are preserved
domain transfer behavior
multilingual capability
robustness to chunk length and wording changes

If the model is poor for your domain, retrieval will be poor even with a state-of-the-art index.

Embeddings are not interchangeable forever

A common beginner mistake is assuming embeddings are like generic coordinates. They are not.

Embeddings from different models often live in different spaces. That means:

changing models usually requires re-embedding
mixing vectors from incompatible models can break retrieval
index hyperparameters may need retuning after model changes

Dense vs sparse vs hybrid embeddings

Not all modern retrieval is dense-only.

There are now three common regimes:

dense embeddings for semantic similarity
sparse learned representations for lexical salience and interpretability
hybrid systems combining both

This matters because many “vector databases” now support more than one retrieval modality.

Chunking is part of the embedding problem

For text RAG, chunking strategy is inseparable from vector DB performance.

Bad chunking causes:

semantic dilution
missing facts split across chunks
excessive redundancy
noisy retrieval

A vector database can only retrieve the chunks it was given.

7) Why Agentic AI needs vector databases

Agentic AI systems need more than one-off retrieval. They need usable external memory.

That memory usually falls into at least three buckets.

A. Knowledge memory

This is the classic RAG case:

product docs
wikis
policies
codebases
tickets
research papers

The vector DB helps the agent fetch relevant knowledge at runtime.

B. Episodic memory

This is memory about what happened before:

prior conversations
decisions made
failed attempts
task state
observations from tools
user preferences

This is where vector databases become important for agent continuity. The agent can retrieve semantically related prior episodes, not just exact-key records.

C. Working memory support

An agent has a bounded context window. A vector DB can help select which past items deserve to be reintroduced into context.

That turns the problem into context management:

what should be remembered long-term?
what should be summarized?
what should be retrieved right now?
what should be forgotten or decayed?

The right mental model

For Agentic AI, a vector database is usually not “the memory” by itself.

It is one layer in a broader memory system that may also include:

relational/task state stores
key-value caches
document stores
graph databases
transcript archives
summarization pipelines

A healthy agent stack often uses vector retrieval for semantic recall and another system for authoritative state.

What vector DBs are good at in agents

recalling semantically similar prior events
retrieving relevant external knowledge
surfacing examples and precedents
grounding answers with citations
selecting context under token limits

What they are bad at

exact transactional state
strict workflow correctness
canonical truth for mutable fields
precise counters, balances, or approvals

That distinction matters. If an agent needs to know whether a purchase order is approved, use a transactional store. If it needs to find similar past procurement issues, use vector retrieval.

8) Essentials for using vector DBs with Agentic AI

If you are building agent systems, these are the essentials.

8.1 Separate semantic memory from system-of-record state

Do not put everything into the vector DB.

Use it for:

fuzzy recall
semantic search
memory retrieval

Do not rely on it alone for:

workflow state
permissions truth
financial records
exact task status

8.2 Decide what gets embedded

Not every event deserves a vector.

Useful candidates:

finalized conversation turns
summaries of sessions
documents and chunks
tool outputs worth reusing
durable preferences
postmortems and decisions

Less useful candidates:

noisy intermediate tokens
every tiny action without filtering
ephemeral low-signal traces

Good memory systems usually involve selection and summarization, not raw dumping.

8.3 Attach metadata aggressively

Metadata is how you avoid retrieving nonsense.

Examples:

user or tenant ID
source type
timestamp
access policy
confidence
conversation ID
task ID
expiration / TTL

Without metadata, semantic search becomes operationally dangerous.

8.4 Use recency and relevance together

Agent memory retrieval is usually not just semantic similarity. It is often some blend of:

semantic relevance
recency
importance
source trust
user/tenant match

Many practical systems apply a custom ranking layer on top of ANN results.

8.5 Re-rank before injecting into context

The cost of passing irrelevant chunks into an LLM is high:

wasted tokens
distraction
hallucination risk
diluted answer quality

Use reranking or heuristics before final context assembly.

8.6 Expect reindexing as part of life

Embedding models improve quickly. Your corpus changes. Your schema changes. Your memory policy changes.

Re-embedding and reindexing are not exceptional events. They are normal maintenance.

8.7 Evaluate with application metrics, not only ANN metrics

Recall@10 and latency matter, but the real questions are:

does the agent answer better?
are citations more accurate?
does the memory retrieval help task completion?
does the agent recall the right past decisions?

A vector DB is successful only if the overall AI system improves.

9) Popular vector databases today

The market has become crowded, but the landscape is easier to understand if we group systems by style.

Open source vector databases and vector-native systems

1. FAISS

Technically a library rather than a full database, but still foundational.

Strengths:

rich index families
GPU support
strong research pedigree
excellent for custom pipelines

Trade-off:

you must build more of the operational database layer yourself

2. Milvus

One of the best-known open source vector database platforms.

Strengths:

distributed architecture
multiple index options
strong ecosystem presence
designed as a database/service, not just a library

3. Weaviate

Popular for developer-friendly vector search with integrated APIs and AI-oriented workflow support.

Strengths:

easy developer experience
metadata and schema support
strong positioning around GenAI applications

4. Qdrant

A well-regarded open source vector database with a reputation for pragmatic design and strong filtering support.

Strengths:

payload-aware retrieval
clean APIs
good production ergonomics

5. pgvector

An extension that brings vector similarity search into PostgreSQL.

Strengths:

uses existing Postgres operational model
great when your workload already lives in Postgres
easy adoption path for many teams

Trade-off:

not always the best choice for the largest or most latency-sensitive vector workloads

6. LanceDB

Interesting for local-first and analytics-adjacent workflows with columnar/storage-aware design.

7. Elasticsearch / OpenSearch vector capabilities

Not pure vector databases, but widely used because enterprises already run them.

Strengths:

hybrid search
operational familiarity
strong filtering and text search

Trade-off:

vector retrieval is part of a broader search platform rather than a dedicated vector-native architecture

Commercial / managed vector database offerings

1. Pinecone

Probably the most visible dedicated managed vector database brand from the GenAI era.

Strengths:

managed service simplicity
strong GenAI/RAG positioning
good developer experience

2. Weaviate Cloud / managed offerings

Managed deployment path for teams that want the Weaviate model without self-hosting.

3. Qdrant Cloud

Managed deployment option for Qdrant users.

4. Milvus-based managed services, including Zilliz Cloud

Useful for teams that want Milvus capabilities in managed form.

5. Cloud platform offerings

Major cloud providers increasingly offer vector search inside broader platforms:

vector support in managed databases
search platforms with ANN support
AI knowledge-base products backed by vector retrieval

This blurs the line between “vector database” and “vector capability inside a general database/search product.”

The practical selection rule

Pick based on workload, not hype.

Questions that matter more than branding:

data size
update frequency
filter complexity
multi-tenancy needs
operational preferences
exact vs approximate requirements
hybrid search needs
whether you already run Postgres / Elasticsearch / a cloud-native stack

For many teams, the real competition is not “vector DB A vs vector DB B.” It is:

dedicated vector database
Postgres with pgvector
search engine with vector support
custom FAISS-based service

10) Design trade-offs that matter more than marketing

The vector DB space has plenty of hype. The durable questions are still the old systems questions.

Recall vs latency

Higher recall usually costs more compute or memory.

Memory vs accuracy

Compression and smaller indexes save money, but often reduce accuracy.

Freshness vs optimization

Highly optimized indexes can be expensive to update online.

Simplicity vs flexibility

A simple managed service is easier to adopt; a lower-level library can be tuned more aggressively.

Semantic retrieval vs exact constraints

Pure vector search is elegant in demos. Real enterprise systems need permissions, dates, structured filters, and exact match behavior.

11) What happens next

The future of vector databases is probably not “just more HNSW.”

A few trends are already visible:

hybrid dense + sparse retrieval becomes standard
multimodal indexing becomes normal rather than special
disk-aware and tiered-storage systems matter more as corpora grow
streaming and dynamic indexing become more important for constantly changing knowledge bases
agent memory systems demand better support for episodic recall, decay, summarization, and retrieval policies
the boundary between vector DB, search engine, and general database with vector support keeps blurring

Benchmarking is also becoming more realistic. The Big ANN challenge and related efforts show that future evaluation must cover more than static dense search. Filtering, sparse retrieval, OOD queries, and streaming updates all matter in real systems.

Conclusion

A vector database is not magic. It is the convergence of two old ideas and one new market force:

old idea #1: embeddings map meaning into geometry
old idea #2: approximate nearest-neighbor search makes high-dimensional retrieval practical
new force: generative AI and agents made external semantic retrieval a mainstream application primitive

That is why vector databases suddenly feel ubiquitous.

They are not replacing relational databases. They are not replacing search engines. And they are not a complete memory architecture by themselves.

What they are is a specialized retrieval layer for embedding-based systems. In the age of RAG and Agentic AI, that layer has become essential.

If you are building agents, the best way to think about vector databases is simple:

use them for semantic recall, not as a substitute for every other data system.

That mental model will save you a lot of architectural pain.

References

Le Ma et al. A Comprehensive Survey on Vector Database: Storage and Retrieval Technique, Challenge. arXiv, 2025. https://arxiv.org/html/2310.11703v2
Wenqi Li et al. A Survey of Vector Database Management Systems. arXiv, 2023. https://arxiv.org/abs/2310.14021
Hervé Jégou, Matthijs Douze, and Cordelia Schmid. Product Quantization for Nearest Neighbor Search. IEEE TPAMI, 2011.
Yu A. Malkov and Dmitry A. Yashunin. Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. arXiv:1603.09320. https://arxiv.org/abs/1603.09320
Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion-scale similarity search with GPUs. arXiv:1702.08734. https://arxiv.org/abs/1702.08734
FAISS project documentation and repository. https://github.com/facebookresearch/faiss
Subramanya et al. DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node. Microsoft Research / NeurIPS, 2019. https://www.microsoft.com/en-us/research/publication/diskann-fast-accurate-billion-point-nearest-neighbor-search-on-a-single-node/
Patrick Lewis et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv:2005.11401. https://arxiv.org/abs/2005.11401
Harsha Vardhan Simhadri et al. Results of the Big ANN: NeurIPS’23 competition. arXiv:2409.17424. https://arxiv.org/abs/2409.17424
Milvus documentation. https://milvus.io/
Weaviate documentation. https://weaviate.io/
Qdrant documentation. https://qdrant.tech/
pgvector repository. https://github.com/pgvector/pgvector
Pinecone learning materials and product documentation. https://www.pinecone.io/

1) What is a vector database?#

2) Why ordinary databases are not enough#

3) The core abstraction: embeddings + nearest neighbor search#

4) History of vector databases: before and after generative AI#

Era A: Before generative AI became mainstream#

Early foundation: nearest neighbor search and the curse of dimensionality#

2000s to mid-2010s: ANN research matures#

Libraries before databases#

Late 2010s to early 2020s: from ANN engines to vector-native systems#

Era B: After generative AI became popular#

The RAG turning point#

Why the GenAI boom changed the market#

The post-GenAI shift in expectations#

5) Key implementation details: how vector databases actually work#

5.1 Data model#

5.2 Distance metrics#

5.3 Exact search vs approximate search#

5.4 Major ANN index families#

A. Flat / brute-force indexes#

B. IVF: inverted file indexes#

C. PQ / OPQ: quantization-based indexes#

D. Graph-based indexes: HNSW and relatives#

E. Disk-aware indexes#

5.5 Metadata filtering is harder than it sounds#

5.6 Sharding, partitioning, and replication#

5.7 Updates, deletions, and rebuilds#

5.8 Hybrid retrieval#

5.9 Reranking#

5.10 Hardware considerations#

6) Relationship with embedding models#

Embedding models define the geometry#

Embeddings are not interchangeable forever#

Dense vs sparse vs hybrid embeddings#

Chunking is part of the embedding problem#

7) Why Agentic AI needs vector databases#

A. Knowledge memory#

B. Episodic memory#

C. Working memory support#

The right mental model#

What vector DBs are good at in agents#

What they are bad at#

8) Essentials for using vector DBs with Agentic AI#

8.1 Separate semantic memory from system-of-record state#

8.2 Decide what gets embedded#

8.3 Attach metadata aggressively#

8.4 Use recency and relevance together#

8.5 Re-rank before injecting into context#

8.6 Expect reindexing as part of life#

8.7 Evaluate with application metrics, not only ANN metrics#

9) Popular vector databases today#

Open source vector databases and vector-native systems#

1. FAISS#

2. Milvus#

3. Weaviate#

4. Qdrant#

5. pgvector#

6. LanceDB#

7. Elasticsearch / OpenSearch vector capabilities#

Commercial / managed vector database offerings#

1. Pinecone#

2. Weaviate Cloud / managed offerings#

3. Qdrant Cloud#

4. Milvus-based managed services, including Zilliz Cloud#

5. Cloud platform offerings#

The practical selection rule#

10) Design trade-offs that matter more than marketing#

Recall vs latency#

Memory vs accuracy#

Freshness vs optimization#

Simplicity vs flexibility#

Semantic retrieval vs exact constraints#

11) What happens next#

Conclusion#

References#

1) What is a vector database?

2) Why ordinary databases are not enough

3) The core abstraction: embeddings + nearest neighbor search

4) History of vector databases: before and after generative AI

Era A: Before generative AI became mainstream

Early foundation: nearest neighbor search and the curse of dimensionality

2000s to mid-2010s: ANN research matures

Libraries before databases

Late 2010s to early 2020s: from ANN engines to vector-native systems

Era B: After generative AI became popular

The RAG turning point

Why the GenAI boom changed the market

The post-GenAI shift in expectations

5) Key implementation details: how vector databases actually work

5.1 Data model

5.2 Distance metrics

5.3 Exact search vs approximate search

5.4 Major ANN index families

A. Flat / brute-force indexes

B. IVF: inverted file indexes

C. PQ / OPQ: quantization-based indexes

D. Graph-based indexes: HNSW and relatives

E. Disk-aware indexes

5.5 Metadata filtering is harder than it sounds

5.6 Sharding, partitioning, and replication

5.7 Updates, deletions, and rebuilds

5.8 Hybrid retrieval

5.9 Reranking

5.10 Hardware considerations

6) Relationship with embedding models

Embedding models define the geometry

Embeddings are not interchangeable forever

Dense vs sparse vs hybrid embeddings

Chunking is part of the embedding problem

7) Why Agentic AI needs vector databases

A. Knowledge memory

B. Episodic memory

C. Working memory support

The right mental model

What vector DBs are good at in agents

What they are bad at

8) Essentials for using vector DBs with Agentic AI

8.1 Separate semantic memory from system-of-record state

8.2 Decide what gets embedded

8.3 Attach metadata aggressively

8.4 Use recency and relevance together

8.5 Re-rank before injecting into context

8.6 Expect reindexing as part of life

8.7 Evaluate with application metrics, not only ANN metrics

9) Popular vector databases today

Open source vector databases and vector-native systems

1. FAISS

2. Milvus

3. Weaviate

4. Qdrant

5. pgvector

6. LanceDB

7. Elasticsearch / OpenSearch vector capabilities

Commercial / managed vector database offerings

1. Pinecone

2. Weaviate Cloud / managed offerings

3. Qdrant Cloud

4. Milvus-based managed services, including Zilliz Cloud

5. Cloud platform offerings

The practical selection rule

10) Design trade-offs that matter more than marketing

Recall vs latency

Memory vs accuracy

Freshness vs optimization

Simplicity vs flexibility

Semantic retrieval vs exact constraints

11) What happens next

Conclusion

References