Putting it All Together: Building a Complete Retrieval System with HNSW and Reranking

In our previous post, we explored reranking techniques to refine the results obtained from an initial search. Today, we’re taking a significant step forward by combining HNSW for efficient retrieval and reranking for precision. We’re building a complete retrieval system, from data preparation to final result presentation.

The Big Picture: A Two-Stage Pipeline

Our system operates in two distinct stages:

Retrieval (HNSW): This stage quickly identifies a set of candidate documents based on a query vector. Think of it as casting a wide net – we want to retrieve a manageable number of potentially relevant documents.
Reranking: This stage re-evaluates the candidate documents from the retrieval stage, using a more sophisticated model to determine the most relevant results. Think of this as carefully examining the candidates from the net to select the best fit.

1. Data Preparation

Let’s assume we have a dataset of product descriptions. We need to convert these descriptions into numerical vectors. We’re using Sentence Transformers for this purpose.

from sentence_transformers import SentenceTransformer
import numpy as np

# Load a pre-trained Sentence Transformer model
model = SentenceTransformer('all-mpnet-base-v2')

# Sample product descriptions
descriptions = [
    "High-quality leather wallet with multiple card slots.",
    "Stylish cotton t-shirt for everyday wear.",
    "Durable stainless steel water bottle with leak-proof lid.",
    "Comfortable running shoes with excellent cushioning.",
    "Elegant silk scarf with intricate floral pattern.",
    "Robust backpack with multiple compartments."
]

# Convert descriptions to embeddings
embeddings = model.encode(descriptions)

print(f"Shape of embeddings: {embeddings.shape}") # Expected: (6, 768)

2. Building the HNSW Index

Now, we’re using nmslib to build an HNSW index on top of the generated embeddings.

import nmslib
from nmslib import IndexType

# Create an HNSW index
index = nmslib.init(IndexType.HNSW, space='cosine')  # Using cosine similarity

# Add the embeddings to the index
index.add(np.arange(len(descriptions)), embeddings)

# Build the index (adjust m and ef_construction for performance)
index.build(m=16, ef_construction=200) # m is number of connections per node, ef_construction is construction time.

print(f"Index built with {len(descriptions)} documents.")

3. Querying the HNSW Index

Let’s say we have a query: “Find a durable bag for travel.” We need to convert this query into a vector.

query = "Find a durable bag for travel."
query_embedding = model.encode(query)

# Search the index (adjust ef for search time vs. accuracy)
N = 3  # Retrieve top 3 candidates
results = index.query(query_embedding, N)

print(f"Query results: {results}")

# Print the retrieved descriptions
for i in results[0]:
    print(f"Document {i}: {descriptions[i]}")

4. Implementing a Simple Reranking Model

For simplicity, we’re using cosine similarity as our reranking model. We’re comparing the query vector to the retrieved candidates.

def rerank(query_embedding, candidates, descriptions):
    """Reranks a list of candidates using cosine similarity."""
    scores = []
    for i in candidates:
        score = np.dot(query_embedding, descriptions[i])  # Cosine similarity
        scores.append(score)

    # Sort candidates by score
    ranked_candidates = np.argsort(scores)[::-1]
    return ranked_candidates

# Rerank the retrieved candidates
ranked_candidates = rerank(query_embedding, results[0], descriptions)

print("Reranked candidates:")
for i in ranked_candidates:
    print(f"Document {i}: {descriptions[i]}")

5. Combining HNSW and Reranking: The Complete Pipeline

Let’s encapsulate the entire process into a function.

def retrieve_and_rerank(query, model, index, descriptions):
    """Retrieves documents using HNSW and reranks them."""

    query_embedding = model.encode(query)
    N = 5 # Retrieve top 3 candidates

    # HNSW Retrieval
    results = index.query(query_embedding, N)

    # Reranking
    ranked_candidates = rerank(query_embedding, results[0], descriptions)

    return ranked_candidates

# Example usage
ranked_candidates = retrieve_and_rerank("Find a durable bag for travel.", model, index, descriptions)

print("Final ranked candidates:")
for i in ranked_candidates:
    print(f"Document {i}: {descriptions[i]}")

6. A More Sophisticated Reranking Model (Cross-Encoder)

While cosine similarity is simple, it doesn’t capture the nuances of language. Let’s use a cross-encoder model for a more accurate reranking. We’re using sentence-transformers again, but this time with a cross-encoder.

from sentence_transformers import CrossEncoder

# Load a cross-encoder model
cross_encoder = CrossEncoder('cross-encoder/ms-marco-TinyBERT-v2')

def rerank_cross_encoder(query, candidates, descriptions, cross_encoder):
    """Reranks candidates using a cross-encoder model."""
    scores = []
    for i in candidates:
        text = f"Query: {query} Document: {descriptions[i]}"
        score = cross_encoder.encode(text)[0]  # Get the relevance score
        scores.append(score)

    ranked_candidates = np.argsort(scores)[::-1]
    return ranked_candidates

# Example usage
N = 5
results = index.query(query_embedding, N)
ranked_candidates = rerank_cross_encoder(query, results[0], descriptions, cross_encoder)

print("Final ranked candidates (cross-encoder):")
for i in ranked_candidates:
    print(f"Document {i}: {descriptions[i]}")

7. Performance Considerations and Tuning

HNSW Parameters (m, ef_construction, ef): Experiment with different values to balance index build time and search accuracy.
Reranking Model: Choose a model that aligns with your specific use case. Consider factors like model size, accuracy, and inference time.
Hybrid Approach: Combine HNSW with other retrieval methods for improved performance.
Caching: Cache frequently used embeddings and search results to reduce latency.

Key Takeaways

Combining HNSW for efficient retrieval with a sophisticated reranking model significantly improves the quality of search results.
Careful tuning of HNSW parameters and selection of an appropriate reranking model are crucial for optimal performance.
This two-stage pipeline offers a flexible and scalable solution for a wide range of retrieval tasks.
Choosing the right model and understanding its trade-offs is essential for achieving the best results.

This complete retrieval system demonstrates how to leverage the power of HNSW and reranking to build a high-performance search application. By combining efficient retrieval with accurate relevance scoring, you can deliver a superior user experience and unlock the full potential of your data.

Discover more from A Streak of Communication

Subscribe to get the latest posts sent to your email.

Putting it All Together: Building a Complete Retrieval System with HNSW and Reranking

Like this:

Related

Discover more from A Streak of Communication

Leave a ReplyCancel reply

Share this:

Like this:

Related

Discover more from A Streak of Communication

Check this too

Replication Strategies: Synchronous vs. Asynchronous

Replication: Ensuring Data Availability

Sharding Deep Dive: Consistent Hashing

Leave a ReplyCancel reply

Discover more from A Streak of Communication