Cluster vectors into cells, search only the nearest cells, and compress vectors to save memory.
IVF: assign vectors to nearest centroid; search only the closest few cells. PQ: compress each vector into small codes.
Frequently asked questions
Is FAISS what Google Lens uses to match my shoe photo?
That is the kind of job it does. Google Lens matches your shoe photo against billions of product images to find the closest-looking ones. FAISS makes searching billions of vectors feasible: it clusters them into cells and searches only the nearest cells (so it skips most of the data), and compresses vectors so they fit in memory. Like all approximate methods it can occasionally miss the exact closest match, but that trade is what delivers near-instant results at billion-image scale.
What does the IVF (inverted file) part of FAISS do?
It avoids comparing the query against every vector. During indexing, all vectors are clustered (via k-means) around a set of centroids, and each vector is filed under its nearest centroid - an 'inverted file' from centroid to its member vectors. At query time, FAISS finds the few centroids nearest the query and searches only those cells, skipping the vast majority of vectors. The nprobe parameter controls how many cells to search: more cells means higher accuracy but slower queries. This cluster-and-skip strategy is the core speed-up.
What is product quantization (PQ) and why does it matter?
PQ is FAISS's compression trick. A high-dimensional vector is split into several sub-vectors, and each sub-vector is replaced by the ID of the nearest entry in a small learned codebook. So a vector that took, say, 512 bytes of floats becomes a handful of byte-sized codes - often a 10-30x reduction. Distances can be estimated directly from these codes using precomputed tables. This is what lets FAISS hold billions of vectors in RAM and compute approximate distances extremely fast; the cost is a small loss of precision.
Why is FAISS 'approximate' - can it miss the true nearest neighbour?
Yes, by design, and the demo shows it: if the true nearest vector sits in a cell that was not among the nprobe searched, FAISS will not find it and returns the best from the cells it did search. Likewise, PQ compression introduces small distance errors. These approximations are deliberate trades: exact nearest-neighbour search in high dimensions is prohibitively slow at scale. FAISS accepts occasionally missing the exact best in exchange for being orders of magnitude faster, and you tune nprobe to balance recall against speed.
How does FAISS compare to HNSW?
Both are approximate nearest-neighbour methods for vector search, but they use different structures. FAISS (IVF+PQ) clusters vectors and compresses them, excelling at huge datasets where memory is the constraint - PQ's compression is its standout feature. HNSW builds a navigable layered graph and typically gives excellent recall and speed for in-memory datasets, but uses more memory per vector. In practice FAISS even offers an HNSW index option. Rough guide: FAISS IVF+PQ for billion-scale memory-constrained sets; HNSW for top recall when memory allows. They are complementary tools in the same vector-search toolbox.
Where is FAISS used?
FAISS (from Meta AI) is a workhorse of modern AI retrieval. It powers semantic search (finding text by meaning via embeddings), recommendation systems (similar items or users), image and audio similarity search, and - prominently - the retrieval step of RAG (retrieval-augmented generation), where an LLM looks up relevant documents by embedding similarity before answering. Anywhere you need to find the most similar items among millions or billions of embedding vectors quickly, FAISS is one of the standard engines, alongside vector databases that often use HNSW internally.