A lot of the recent attention on vector databases makes them sound like a brand-new invention created by the generative AI boom. That is not really true.

What changed is not the underlying math. What changed is the workload.

For more than a decade, industry and academia had already been working on large-scale nearest-neighbor search for recommendation systems, image retrieval, search, ads, and ranking. The generative AI wave did something different: it turned vector retrieval from a specialized backend capability into a mainstream application primitive. Once teams started building retrieval-augmented generation (RAG), long-term AI memory, semantic search, and tool-using agents, vector databases stopped being niche infrastructure and became part of the standard stack.

This post is a deep technical guide to that shift:

  1. what a vector database actually is
  2. how the field evolved before and after generative AI became popular
  3. the key implementation details that matter in practice
  4. how vector databases relate to embedding models
  5. how they fit into Agentic AI systems for memory, knowledge, and context management
  6. which open source and commercial systems matter today

1) What is a vector database?

A vector database is a storage and retrieval system optimized for high-dimensional embeddings and similarity search.

Instead of retrieving rows primarily by exact keys or boolean predicates, a vector database retrieves items by closeness in vector space. That closeness usually means one of these distance/similarity functions:

  • cosine similarity
  • dot product / inner product
  • Euclidean (L2) distance

In practice, a vector database stores something like this for each item:

  • an ID
  • a vector embedding
  • optional payload/metadata such as source, author, tenant, tags, time range, permissions
  • optional raw text or object reference

A query is often:

  1. take a query string, image, or event
  2. convert it into an embedding using a model
  3. search for the nearest vectors
  4. optionally apply metadata filters
  5. return top-k candidates

So the short definition is:

A vector database is a database or retrieval engine that stores embeddings and supports fast approximate nearest-neighbor search, usually with metadata filtering, updates, and operational capabilities for production use.

That last clause matters. A vector index is not the same thing as a vector database.

  • Vector index = the search data structure, such as HNSW or IVF-PQ
  • Vector database = the operational system around the index: ingestion, persistence, filters, replication, access control, updates, observability, APIs, multi-tenancy, backups, and sometimes hybrid retrieval

If you want an intuition: a vector database is to embeddings what an inverted index is to keyword search. It is the infrastructure layer that makes semantic retrieval fast enough and manageable enough to use in real systems.


2) Why ordinary databases are not enough

A traditional relational database is excellent at questions like:

  • find customer 12345
  • return invoices created after May 1
  • join orders with order_items
  • filter users where country = “DE” and status = “active”

But semantic retrieval asks a different question:

  • find chunks that mean roughly the same thing as this query
  • find similar images
  • find related past incidents
  • find memory entries that are conceptually close even when wording differs

You can store embeddings in a normal database table. That does not automatically solve the hard part: efficiently searching millions or billions of high-dimensional vectors with low latency.

The brute-force approach is exact search:

  • compute distance from the query to every vector
  • sort results
  • return top-k

That is perfectly fine for small collections. It becomes expensive at production scale.

The entire vector database field exists because high-dimensional similarity search needs specialized indexing strategies, memory layouts, compression techniques, and query execution paths.


Vector databases only make sense because of embedding models.

An embedding model maps data into a vector space such that semantically similar objects are placed near each other. The object could be:

  • text
  • code
  • image
  • audio
  • video
  • user behavior
  • graph nodes
  • products or documents

For text systems, the usual workflow is:

  • split documents into chunks
  • embed each chunk into a dense vector
  • store vectors plus metadata in the database
  • embed the user query
  • retrieve nearest chunks

This is dense retrieval.

The quality of the system depends on two linked pieces:

  1. representation quality: does the embedding model preserve useful semantic relationships?
  2. search quality: does the index retrieve the right neighbors quickly enough?

A great vector database cannot rescue a bad embedding model. A great embedding model can still underperform if the index is misconfigured, the similarity metric is wrong, or the chunking/filtering strategy is poor.


4) History of vector databases: before and after generative AI

The history makes more sense if we separate the field into two eras.

Era A: Before generative AI became mainstream

Before the ChatGPT wave, the core problem was usually framed as nearest neighbor search or approximate nearest neighbor (ANN) search rather than “vector databases.”

Early foundation: nearest neighbor search and the curse of dimensionality

Classical search structures such as k-d trees and ball trees work well at lower dimensions, but performance degrades badly as dimensions grow. This is part of the classic curse of dimensionality: distance concentration and search-space explosion make exact high-dimensional search expensive.

That led researchers toward approximate methods, accepting a trade-off:

  • slightly lower recall
  • dramatically better speed and scale

2000s to mid-2010s: ANN research matures

Several major families of ANN methods became foundational:

  • tree-based partitioning for lower-dimensional settings
  • hashing approaches such as locality-sensitive hashing (LSH)
  • quantization methods such as product quantization (PQ) for compressing vectors and accelerating search
  • graph-based methods for highly effective approximate navigation

One major milestone was Product Quantization for Nearest Neighbor Search by Jégou, Douze, and Schmid (2011), which showed how to compress vectors into compact codes while preserving enough distance structure for fast search at scale.

Another major milestone was HNSW (Hierarchical Navigable Small World graphs) by Malkov and Yashunin (2016), which became one of the most influential ANN methods thanks to its strong recall-latency trade-off and practical performance.

In this era, the problem was already industrially important. The main workloads were:

  • recommendation systems
  • image similarity and visual search
  • ads and ranking
  • music and item similarity
  • catalog matching
  • anomaly detection
  • search and retrieval in specialized domains

Libraries before databases

Before purpose-built vector databases became prominent, teams often used ANN libraries directly:

  • Annoy from Spotify for approximate nearest neighbors
  • FAISS from Meta for dense vector similarity search and clustering
  • NMSLIB / hnswlib for efficient graph-based ANN

FAISS in particular became a foundational building block. It was not a full database, but it offered a broad family of indexes, compression schemes, GPU support, and benchmarking utilities. Many later vector database products either wrapped FAISS-like concepts or competed by offering a more operationally complete system.

Late 2010s to early 2020s: from ANN engines to vector-native systems

As embeddings spread beyond recommender systems into semantic search and multimodal ML, systems started adding more database-like capabilities:

  • persistence
  • sharding and replication
  • metadata filtering
  • REST/gRPC APIs
  • incremental ingestion
  • distributed execution

This is the period when systems such as Milvus, Weaviate, and later Qdrant began to gain visibility as dedicated vector-native platforms.

Generative AI did not invent vector search, but it created a new default pattern:

  • LLMs are powerful but have bounded context windows and imperfect factual memory
  • external retrieval can inject relevant context at generation time
  • embeddings + vector retrieval become the bridge between a model and external knowledge

The RAG turning point

The 2020 RAG paper by Lewis et al. helped define the now-standard architecture of retrieval-augmented generation: a parametric language model paired with a non-parametric memory accessed through dense retrieval.

That idea turned vector retrieval into a first-class component of modern AI applications.

Once teams started building:

  • company knowledge chatbots
  • code search assistants
  • support copilots
  • memory for agents
  • long-document QA
  • multimodal search

vector search moved from being a specialist tool to a default architectural choice.

Why the GenAI boom changed the market

After 2022, several things happened at once:

  1. Embedding APIs became mainstream. It became easy for application teams to turn text into vectors.
  2. RAG became a standard design pattern. Teams needed fast retrieval over their own corpora.
  3. Agents needed memory. Not just one-shot retrieval, but recurrent storage and recall of facts, events, plans, and prior actions.
  4. Operational demand exploded. People no longer wanted an ANN library; they wanted an API, durability, filters, access control, observability, backups, hybrid search, and easy cloud deployment.

This is when the phrase vector database became much more commercially visible.

The post-GenAI shift in expectations

Before GenAI, success was often measured mostly by ANN metrics:

  • recall@k
  • latency
  • memory footprint
  • throughput

After GenAI, product expectations expanded to include:

  • document ingestion pipelines
  • chunking and re-indexing workflows
  • filtering by tenant, time, or permissions
  • hybrid lexical + vector retrieval
  • support for sparse and dense vectors
  • integration with orchestration frameworks and agent runtimes
  • online updates and background compaction

In other words, the field moved from nearest-neighbor algorithms to retrieval systems for AI applications.


5) Key implementation details: how vector databases actually work

This is the part most high-level explainers skip.

A production vector database is usually a layered system:

  1. ingest data and metadata
  2. generate or accept embeddings
  3. store vectors and payloads
  4. build one or more ANN indexes
  5. execute filtered similarity queries
  6. return ranked candidates
  7. keep the system up to date under inserts, deletes, re-indexing, and replication

Let’s unpack the main pieces.

5.1 Data model

At minimum, each record often looks like:

(id, vector, metadata, payload)

Example:

{
  "id": "doc_481_chunk_12",
  "vector": [0.013, -0.441, ...],
  "metadata": {
    "document_id": "doc_481",
    "source": "wiki",
    "tenant": "acme",
    "created_at": "2026-05-05",
    "access_level": "internal"
  },
  "payload": "A vector database stores embeddings and supports ANN search..."
}

Some systems store the raw payload internally. Others store only a pointer to object storage or another database.

5.2 Distance metrics

The similarity metric must match the embedding model and retrieval objective.

Common choices:

  • cosine similarity: angle-based similarity, common for text embeddings
  • inner product: often used when model training optimizes dot-product ranking
  • L2 distance: common in many ANN algorithms and image/feature workloads

A subtle but critical point:

If the embedding model was trained for cosine similarity, using L2 or raw dot product incorrectly can hurt retrieval quality.

In some systems, cosine similarity is implemented by normalizing vectors and then using dot product.

Exact search computes the true nearest neighbors. It is simple and accurate, but expensive at large scale.

Approximate nearest neighbor (ANN) search sacrifices some recall to get dramatically lower latency and better memory/computation trade-offs.

This is the central engineering compromise in vector retrieval.

The practical question is rarely “exact or approximate?” The practical question is:

  • what recall can we afford?
  • at what latency?
  • at what memory cost?
  • under what update rate?

5.4 Major ANN index families

A. Flat / brute-force indexes

These store vectors directly and compare against all of them.

Pros:

  • exact results
  • simple implementation
  • easy updates

Cons:

  • expensive at scale

Useful when:

  • dataset is small
  • recall must be exact
  • reranking a small candidate set

B. IVF: inverted file indexes

IVF partitions vector space into coarse clusters. At query time, the system probes only the most relevant clusters instead of scanning everything.

Think of it as:

  1. learn cluster centroids
  2. assign each vector to a centroid/list
  3. at query time, find the nearest centroids
  4. search only within those lists

Pros:

  • scalable
  • tunable latency/recall via probe count
  • works well with quantization

Cons:

  • misses neighbors in unprobed clusters
  • quality depends on training and partitioning

C. PQ / OPQ: quantization-based indexes

Product Quantization (PQ) compresses vectors into short codes by splitting them into subspaces and quantizing each subspace independently.

This is powerful because memory bandwidth often becomes the bottleneck. Compressed codes let systems store far more vectors and compare them more efficiently.

Pros:

  • much lower memory footprint
  • enables billion-scale retrieval on limited hardware

Cons:

  • approximate distances introduce error
  • more training and tuning complexity

Many industrial systems combine IVF and PQ into IVF-PQ-style architectures.

D. Graph-based indexes: HNSW and relatives

HNSW builds a multi-layer proximity graph. Search starts at upper layers for coarse navigation and descends toward lower layers for fine-grained local search.

Why it became so popular:

  • excellent recall-latency trade-off
  • strong practical performance on many dense retrieval workloads
  • good out-of-the-box behavior

Key knobs:

  • M: graph connectivity / out-degree budget
  • efConstruction: search breadth during index build
  • efSearch: search breadth at query time

Bigger values usually mean:

  • better recall
  • more memory
  • slower builds or queries

E. Disk-aware indexes

In-memory indexes are fast, but expensive. Disk-aware systems such as DiskANN showed that well-designed graph search combined with SSD-aware layouts can deliver high recall and low latency on datasets too large for RAM.

This matters because many real corpora are large enough that “just keep everything in memory” becomes financially painful.

5.5 Metadata filtering is harder than it sounds

A lot of GenAI applications need queries like:

  • nearest chunks for tenant = acme
  • only documents after date X
  • only files user Y can access
  • only source = handbook and language = en

This is not just vector search anymore. It is vector search under constraints.

There are several ways systems handle this:

  • pre-filter then search: filter IDs first, then search within that subset
  • search then post-filter: retrieve candidates, then apply filters
  • filter-aware indexes: encode partitions or payload-aware search paths
  • hybrid execution: combine scalar filtering and ANN in one plan

This is one of the biggest gaps between a simple ANN library and a serious database system.

5.6 Sharding, partitioning, and replication

A production vector DB must scale across machines.

Typical strategies include:

  • sharding by tenant, collection, ID range, or hash
  • partitioning for locality and workload isolation
  • replication for availability and read scalability

The tricky part is that ANN indexes are not as easy to distribute as key-value lookups. Once data is split across shards, the system often must:

  1. fan the query out to multiple shards
  2. search locally on each shard
  3. merge candidate lists globally
  4. optionally rerank

This merge stage affects both latency and recall.

5.7 Updates, deletions, and rebuilds

Many explainers assume a static dataset. Production systems are not static.

Real questions include:

  • how fast can new vectors be inserted?
  • are deletes logical or physical?
  • when is background compaction required?
  • when must the index be rebuilt?
  • what happens when the embedding model changes?

This last point is especially important:

If you switch to a materially different embedding model, you usually need to re-embed the corpus and rebuild the index.

The vector space itself changed. Old and new embeddings may no longer be meaningfully comparable.

5.8 Hybrid retrieval

Pure vector retrieval is not always enough.

Keyword search still matters for:

  • exact names
  • error codes
  • identifiers
  • rare acronyms
  • legal clauses
  • code symbols

That is why many production systems use hybrid retrieval:

  • lexical retrieval (BM25 / inverted index)
  • vector retrieval (dense or sparse embeddings)
  • fusion or reranking

This is often better than vector-only retrieval for enterprise search and RAG.

5.9 Reranking

ANN search is often just the first stage.

A common stack is:

  1. retrieve top 50 or top 200 candidates with a fast vector index
  2. rerank with a stronger but slower model, often a cross-encoder
  3. send the best few chunks to the LLM

This staged design is important because a vector DB optimizes candidate generation, not necessarily final relevance.

5.10 Hardware considerations

Vector search performance is often limited by:

  • memory bandwidth
  • cache locality
  • SIMD efficiency
  • GPU utilization
  • SSD access patterns

This is why implementations obsess over:

  • contiguous memory layout
  • vector normalization strategy
  • quantized representations
  • batched search
  • GPU kernels
  • asynchronous I/O for disk-backed indexes

The field is partly about algorithms and partly about systems engineering.


6) Relationship with embedding models

A vector database without embeddings is just empty infrastructure.

Embedding models define the geometry

The embedding model determines:

  • vector dimension
  • similarity metric
  • what semantic relationships are preserved
  • domain transfer behavior
  • multilingual capability
  • robustness to chunk length and wording changes

If the model is poor for your domain, retrieval will be poor even with a state-of-the-art index.

Embeddings are not interchangeable forever

A common beginner mistake is assuming embeddings are like generic coordinates. They are not.

Embeddings from different models often live in different spaces. That means:

  • changing models usually requires re-embedding
  • mixing vectors from incompatible models can break retrieval
  • index hyperparameters may need retuning after model changes

Dense vs sparse vs hybrid embeddings

Not all modern retrieval is dense-only.

There are now three common regimes:

  • dense embeddings for semantic similarity
  • sparse learned representations for lexical salience and interpretability
  • hybrid systems combining both

This matters because many “vector databases” now support more than one retrieval modality.

Chunking is part of the embedding problem

For text RAG, chunking strategy is inseparable from vector DB performance.

Bad chunking causes:

  • semantic dilution
  • missing facts split across chunks
  • excessive redundancy
  • noisy retrieval

A vector database can only retrieve the chunks it was given.


7) Why Agentic AI needs vector databases

Agentic AI systems need more than one-off retrieval. They need usable external memory.

That memory usually falls into at least three buckets.

A. Knowledge memory

This is the classic RAG case:

  • product docs
  • wikis
  • policies
  • codebases
  • tickets
  • research papers

The vector DB helps the agent fetch relevant knowledge at runtime.

B. Episodic memory

This is memory about what happened before:

  • prior conversations
  • decisions made
  • failed attempts
  • task state
  • observations from tools
  • user preferences

This is where vector databases become important for agent continuity. The agent can retrieve semantically related prior episodes, not just exact-key records.

C. Working memory support

An agent has a bounded context window. A vector DB can help select which past items deserve to be reintroduced into context.

That turns the problem into context management:

  • what should be remembered long-term?
  • what should be summarized?
  • what should be retrieved right now?
  • what should be forgotten or decayed?

The right mental model

For Agentic AI, a vector database is usually not “the memory” by itself.

It is one layer in a broader memory system that may also include:

  • relational/task state stores
  • key-value caches
  • document stores
  • graph databases
  • transcript archives
  • summarization pipelines

A healthy agent stack often uses vector retrieval for semantic recall and another system for authoritative state.

What vector DBs are good at in agents

  • recalling semantically similar prior events
  • retrieving relevant external knowledge
  • surfacing examples and precedents
  • grounding answers with citations
  • selecting context under token limits

What they are bad at

  • exact transactional state
  • strict workflow correctness
  • canonical truth for mutable fields
  • precise counters, balances, or approvals

That distinction matters. If an agent needs to know whether a purchase order is approved, use a transactional store. If it needs to find similar past procurement issues, use vector retrieval.


8) Essentials for using vector DBs with Agentic AI

If you are building agent systems, these are the essentials.

8.1 Separate semantic memory from system-of-record state

Do not put everything into the vector DB.

Use it for:

  • fuzzy recall
  • semantic search
  • memory retrieval

Do not rely on it alone for:

  • workflow state
  • permissions truth
  • financial records
  • exact task status

8.2 Decide what gets embedded

Not every event deserves a vector.

Useful candidates:

  • finalized conversation turns
  • summaries of sessions
  • documents and chunks
  • tool outputs worth reusing
  • durable preferences
  • postmortems and decisions

Less useful candidates:

  • noisy intermediate tokens
  • every tiny action without filtering
  • ephemeral low-signal traces

Good memory systems usually involve selection and summarization, not raw dumping.

8.3 Attach metadata aggressively

Metadata is how you avoid retrieving nonsense.

Examples:

  • user or tenant ID
  • source type
  • timestamp
  • access policy
  • confidence
  • conversation ID
  • task ID
  • expiration / TTL

Without metadata, semantic search becomes operationally dangerous.

8.4 Use recency and relevance together

Agent memory retrieval is usually not just semantic similarity. It is often some blend of:

  • semantic relevance
  • recency
  • importance
  • source trust
  • user/tenant match

Many practical systems apply a custom ranking layer on top of ANN results.

8.5 Re-rank before injecting into context

The cost of passing irrelevant chunks into an LLM is high:

  • wasted tokens
  • distraction
  • hallucination risk
  • diluted answer quality

Use reranking or heuristics before final context assembly.

8.6 Expect reindexing as part of life

Embedding models improve quickly. Your corpus changes. Your schema changes. Your memory policy changes.

Re-embedding and reindexing are not exceptional events. They are normal maintenance.

8.7 Evaluate with application metrics, not only ANN metrics

Recall@10 and latency matter, but the real questions are:

  • does the agent answer better?
  • are citations more accurate?
  • does the memory retrieval help task completion?
  • does the agent recall the right past decisions?

A vector DB is successful only if the overall AI system improves.


The market has become crowded, but the landscape is easier to understand if we group systems by style.

Open source vector databases and vector-native systems

1. FAISS

Technically a library rather than a full database, but still foundational.

Strengths:

  • rich index families
  • GPU support
  • strong research pedigree
  • excellent for custom pipelines

Trade-off:

  • you must build more of the operational database layer yourself

2. Milvus

One of the best-known open source vector database platforms.

Strengths:

  • distributed architecture
  • multiple index options
  • strong ecosystem presence
  • designed as a database/service, not just a library

3. Weaviate

Popular for developer-friendly vector search with integrated APIs and AI-oriented workflow support.

Strengths:

  • easy developer experience
  • metadata and schema support
  • strong positioning around GenAI applications

4. Qdrant

A well-regarded open source vector database with a reputation for pragmatic design and strong filtering support.

Strengths:

  • payload-aware retrieval
  • clean APIs
  • good production ergonomics

5. pgvector

An extension that brings vector similarity search into PostgreSQL.

Strengths:

  • uses existing Postgres operational model
  • great when your workload already lives in Postgres
  • easy adoption path for many teams

Trade-off:

  • not always the best choice for the largest or most latency-sensitive vector workloads

6. LanceDB

Interesting for local-first and analytics-adjacent workflows with columnar/storage-aware design.

7. Elasticsearch / OpenSearch vector capabilities

Not pure vector databases, but widely used because enterprises already run them.

Strengths:

  • hybrid search
  • operational familiarity
  • strong filtering and text search

Trade-off:

  • vector retrieval is part of a broader search platform rather than a dedicated vector-native architecture

Commercial / managed vector database offerings

1. Pinecone

Probably the most visible dedicated managed vector database brand from the GenAI era.

Strengths:

  • managed service simplicity
  • strong GenAI/RAG positioning
  • good developer experience

2. Weaviate Cloud / managed offerings

Managed deployment path for teams that want the Weaviate model without self-hosting.

3. Qdrant Cloud

Managed deployment option for Qdrant users.

4. Milvus-based managed services, including Zilliz Cloud

Useful for teams that want Milvus capabilities in managed form.

5. Cloud platform offerings

Major cloud providers increasingly offer vector search inside broader platforms:

  • vector support in managed databases
  • search platforms with ANN support
  • AI knowledge-base products backed by vector retrieval

This blurs the line between “vector database” and “vector capability inside a general database/search product.”

The practical selection rule

Pick based on workload, not hype.

Questions that matter more than branding:

  • data size
  • update frequency
  • filter complexity
  • multi-tenancy needs
  • operational preferences
  • exact vs approximate requirements
  • hybrid search needs
  • whether you already run Postgres / Elasticsearch / a cloud-native stack

For many teams, the real competition is not “vector DB A vs vector DB B.” It is:

  • dedicated vector database
  • Postgres with pgvector
  • search engine with vector support
  • custom FAISS-based service

10) Design trade-offs that matter more than marketing

The vector DB space has plenty of hype. The durable questions are still the old systems questions.

Recall vs latency

Higher recall usually costs more compute or memory.

Memory vs accuracy

Compression and smaller indexes save money, but often reduce accuracy.

Freshness vs optimization

Highly optimized indexes can be expensive to update online.

Simplicity vs flexibility

A simple managed service is easier to adopt; a lower-level library can be tuned more aggressively.

Semantic retrieval vs exact constraints

Pure vector search is elegant in demos. Real enterprise systems need permissions, dates, structured filters, and exact match behavior.


11) What happens next

The future of vector databases is probably not “just more HNSW.”

A few trends are already visible:

  • hybrid dense + sparse retrieval becomes standard
  • multimodal indexing becomes normal rather than special
  • disk-aware and tiered-storage systems matter more as corpora grow
  • streaming and dynamic indexing become more important for constantly changing knowledge bases
  • agent memory systems demand better support for episodic recall, decay, summarization, and retrieval policies
  • the boundary between vector DB, search engine, and general database with vector support keeps blurring

Benchmarking is also becoming more realistic. The Big ANN challenge and related efforts show that future evaluation must cover more than static dense search. Filtering, sparse retrieval, OOD queries, and streaming updates all matter in real systems.


Conclusion

A vector database is not magic. It is the convergence of two old ideas and one new market force:

  • old idea #1: embeddings map meaning into geometry
  • old idea #2: approximate nearest-neighbor search makes high-dimensional retrieval practical
  • new force: generative AI and agents made external semantic retrieval a mainstream application primitive

That is why vector databases suddenly feel ubiquitous.

They are not replacing relational databases. They are not replacing search engines. And they are not a complete memory architecture by themselves.

What they are is a specialized retrieval layer for embedding-based systems. In the age of RAG and Agentic AI, that layer has become essential.

If you are building agents, the best way to think about vector databases is simple:

use them for semantic recall, not as a substitute for every other data system.

That mental model will save you a lot of architectural pain.


References

  1. Le Ma et al. A Comprehensive Survey on Vector Database: Storage and Retrieval Technique, Challenge. arXiv, 2025. https://arxiv.org/html/2310.11703v2
  2. Wenqi Li et al. A Survey of Vector Database Management Systems. arXiv, 2023. https://arxiv.org/abs/2310.14021
  3. Hervé Jégou, Matthijs Douze, and Cordelia Schmid. Product Quantization for Nearest Neighbor Search. IEEE TPAMI, 2011.
  4. Yu A. Malkov and Dmitry A. Yashunin. Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. arXiv:1603.09320. https://arxiv.org/abs/1603.09320
  5. Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion-scale similarity search with GPUs. arXiv:1702.08734. https://arxiv.org/abs/1702.08734
  6. FAISS project documentation and repository. https://github.com/facebookresearch/faiss
  7. Subramanya et al. DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node. Microsoft Research / NeurIPS, 2019. https://www.microsoft.com/en-us/research/publication/diskann-fast-accurate-billion-point-nearest-neighbor-search-on-a-single-node/
  8. Patrick Lewis et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv:2005.11401. https://arxiv.org/abs/2005.11401
  9. Harsha Vardhan Simhadri et al. Results of the Big ANN: NeurIPS’23 competition. arXiv:2409.17424. https://arxiv.org/abs/2409.17424
  10. Milvus documentation. https://milvus.io/
  11. Weaviate documentation. https://weaviate.io/
  12. Qdrant documentation. https://qdrant.tech/
  13. pgvector repository. https://github.com/pgvector/pgvector
  14. Pinecone learning materials and product documentation. https://www.pinecone.io/