Putting it All Together: Building a Complete Retrieval System with HNSW and Reranking

In our previous post, we explored reranking techniques to refine the results obtained from an initial search. Today, weโ€™re taking a significant step forward by combining HNSW for efficient retrieval and reranking for precision. Weโ€™re building a complete retrieval system, from data preparation to final result presentation.

The Big Picture: A Two-Stage Pipeline

Our system operates in two distinct stages:

  1. Retrieval (HNSW): This stage quickly identifies a set of candidate documents based on a query vector. Think of it as casting a wide net โ€“ we want to retrieve a manageable number of potentially relevant documents.
  2. Reranking: This stage re-evaluates the candidate documents from the retrieval stage, using a more sophisticated model to determine the most relevant results. Think of this as carefully examining the candidates from the net to select the best fit.

1. Data Preparation

Letโ€™s assume we have a dataset of product descriptions. We need to convert these descriptions into numerical vectors. Weโ€™re using Sentence Transformers for this purpose.

from sentence_transformers import SentenceTransformer
import numpy as np

# Load a pre-trained Sentence Transformer model
model = SentenceTransformer('all-mpnet-base-v2')

# Sample product descriptions
descriptions = [
    "High-quality leather wallet with multiple card slots.",
    "Stylish cotton t-shirt for everyday wear.",
    "Durable stainless steel water bottle with leak-proof lid.",
    "Comfortable running shoes with excellent cushioning.",
    "Elegant silk scarf with intricate floral pattern.",
    "Robust backpack with multiple compartments."
]

# Convert descriptions to embeddings
embeddings = model.encode(descriptions)

print(f"Shape of embeddings: {embeddings.shape}") # Expected: (6, 768)

2. Building the HNSW Index

Now, weโ€™re using nmslib to build an HNSW index on top of the generated embeddings.

import nmslib
from nmslib import IndexType

# Create an HNSW index
index = nmslib.init(IndexType.HNSW, space='cosine')  # Using cosine similarity

# Add the embeddings to the index
index.add(np.arange(len(descriptions)), embeddings)

# Build the index (adjust m and ef_construction for performance)
index.build(m=16, ef_construction=200) # m is number of connections per node, ef_construction is construction time.

print(f"Index built with {len(descriptions)} documents.")

3. Querying the HNSW Index

Letโ€™s say we have a query: โ€œFind a durable bag for travel.โ€ We need to convert this query into a vector.

query = "Find a durable bag for travel."
query_embedding = model.encode(query)

# Search the index (adjust ef for search time vs. accuracy)
N = 3  # Retrieve top 3 candidates
results = index.query(query_embedding, N)

print(f"Query results: {results}")

# Print the retrieved descriptions
for i in results[0]:
    print(f"Document {i}: {descriptions[i]}")

4. Implementing a Simple Reranking Model

For simplicity, weโ€™re using cosine similarity as our reranking model. Weโ€™re comparing the query vector to the retrieved candidates.

def rerank(query_embedding, candidates, descriptions):
    """Reranks a list of candidates using cosine similarity."""
    scores = []
    for i in candidates:
        score = np.dot(query_embedding, descriptions[i])  # Cosine similarity
        scores.append(score)

    # Sort candidates by score
    ranked_candidates = np.argsort(scores)[::-1]
    return ranked_candidates

# Rerank the retrieved candidates
ranked_candidates = rerank(query_embedding, results[0], descriptions)

print("Reranked candidates:")
for i in ranked_candidates:
    print(f"Document {i}: {descriptions[i]}")

5. Combining HNSW and Reranking: The Complete Pipeline

Letโ€™s encapsulate the entire process into a function.

def retrieve_and_rerank(query, model, index, descriptions):
    """Retrieves documents using HNSW and reranks them."""

    query_embedding = model.encode(query)
    N = 5 # Retrieve top 3 candidates

    # HNSW Retrieval
    results = index.query(query_embedding, N)

    # Reranking
    ranked_candidates = rerank(query_embedding, results[0], descriptions)

    return ranked_candidates

# Example usage
ranked_candidates = retrieve_and_rerank("Find a durable bag for travel.", model, index, descriptions)

print("Final ranked candidates:")
for i in ranked_candidates:
    print(f"Document {i}: {descriptions[i]}")

6. A More Sophisticated Reranking Model (Cross-Encoder)

While cosine similarity is simple, it doesnโ€™t capture the nuances of language. Letโ€™s use a cross-encoder model for a more accurate reranking. Weโ€™re using sentence-transformers again, but this time with a cross-encoder.

from sentence_transformers import CrossEncoder

# Load a cross-encoder model
cross_encoder = CrossEncoder('cross-encoder/ms-marco-TinyBERT-v2')

def rerank_cross_encoder(query, candidates, descriptions, cross_encoder):
    """Reranks candidates using a cross-encoder model."""
    scores = []
    for i in candidates:
        text = f"Query: {query} Document: {descriptions[i]}"
        score = cross_encoder.encode(text)[0]  # Get the relevance score
        scores.append(score)

    ranked_candidates = np.argsort(scores)[::-1]
    return ranked_candidates

# Example usage
N = 5
results = index.query(query_embedding, N)
ranked_candidates = rerank_cross_encoder(query, results[0], descriptions, cross_encoder)

print("Final ranked candidates (cross-encoder):")
for i in ranked_candidates:
    print(f"Document {i}: {descriptions[i]}")

7. Performance Considerations and Tuning

  • HNSW Parameters (m, ef_construction, ef): Experiment with different values to balance index build time and search accuracy.
  • Reranking Model: Choose a model that aligns with your specific use case. Consider factors like model size, accuracy, and inference time.
  • Hybrid Approach: Combine HNSW with other retrieval methods for improved performance.
  • Caching: Cache frequently used embeddings and search results to reduce latency.

Key Takeaways

  • Combining HNSW for efficient retrieval with a sophisticated reranking model significantly improves the quality of search results.
  • Careful tuning of HNSW parameters and selection of an appropriate reranking model are crucial for optimal performance.
  • This two-stage pipeline offers a flexible and scalable solution for a wide range of retrieval tasks.
  • Choosing the right model and understanding its trade-offs is essential for achieving the best results.

This complete retrieval system demonstrates how to leverage the power of HNSW and reranking to build a high-performance search application. By combining efficient retrieval with accurate relevance scoring, you can deliver a superior user experience and unlock the full potential of your data.


Discover more from A Streak of Communication

Subscribe to get the latest posts sent to your email.

Leave a Reply

Discover more from A Streak of Communication

Subscribe now to keep reading and get access to the full archive.

Continue reading