Reranking: Fine-Tuning Search Results with a Second-Level Model

In our previous post, we explored Hierarchical Navigable Small World (HNSW) graphs for efficient approximate nearest neighbor search. While HNSW significantly improves speed, the initial search results are still approximations. Theyโ€™re good candidates, but not necessarily the absolute best matches. This is where reranking comes into play. Reranking involves using a second-level model to refine the initial results and prioritize the most relevant documents. This post will dive into the concepts and implementation of reranking, building upon our understanding of HNSW and demonstrating how it can dramatically improve search quality.

Why Reranking? The Limitations of Initial Search

The initial HNSW search provides a fast, approximate solution. However, it relies solely on vector similarity. This can be limiting because:

  • Vector representations donโ€™t capture all nuances: Vector embeddings, while powerful, often struggle to capture complex semantic relationships, user intent, or domain-specific knowledge.
  • Similarity isnโ€™t always relevance: Two documents might have similar vectors, but one might be far more relevant to a specific user query based on factors not represented in the vectors.
  • Query understanding is crucial: The initial search doesnโ€™t always perfectly understand the userโ€™s intent. Reranking can correct for this.

The Reranking Pipeline: A Two-Stage Approach

The reranking pipeline typically consists of two stages:

  1. Initial Retrieval: This is our HNSW search, quickly identifying a set of candidate documents.
  2. Reranking: A secondary model assesses the relevance of these candidates and reorders them based on a more sophisticated scoring function.

Building a Reranking Model: Cross-Encoders vs. Re-rankers

There are two main approaches to building a reranking model:

  1. Cross-Encoders: These models process the query and document together to predict a relevance score. They are highly accurate but computationally expensive, making them impractical for reranking large result sets.
  2. Re-rankers (Mono-Encoders): These models encode the query and document separately and then compare their representations to generate a relevance score. They are faster than cross-encoders and better suited for reranking.

Weโ€™ll focus on re-rankers due to their practicality.

Example 1: Simple Cosine Similarity Reranking

Letโ€™s start with a simple example using cosine similarity. This demonstrates the basic principle of reranking without introducing complex models.

import numpy as np

def rerank_cosine_similarity(query_vector, documents, document_vectors, initial_scores):
    """
    Reranks documents using cosine similarity.

    Args:
        query_vector: The query vector.
        documents: A list of document identifiers.
        document_vectors: A list of document vectors.
        initial_scores: Initial scores from HNSW search.

    Returns:
        A list of document identifiers sorted by reranked score.
    """
    reranked_scores = []
    for i in range(len(documents)):
        cosine_similarity = np.dot(query_vector, document_vectors[i]) / (np.linalg.norm(query_vector) * np.linalg.norm(document_vectors[i]))
        reranked_score = cosine_similarity + initial_scores[i] * 0.1  #Combine initial score
        reranked_scores.append((documents[i], reranked_score))

    reranked_scores.sort(key=lambda x: x[1], reverse=True)
    return [doc for doc, score in reranked_scores]

# Example Usage
query_vector = np.array([0.1, 0.2, 0.3, 0.4])
documents = ["doc1", "doc2", "doc3"]
document_vectors = [np.array([0.2, 0.3, 0.4, 0.5]),
                    np.array([0.3, 0.4, 0.5, 0.6]),
                    np.array([0.4, 0.5, 0.6, 0.7])]
initial_scores = [0.8, 0.6, 0.9]  # Initial scores from HNSW

reranked_documents = rerank_cosine_similarity(query_vector, documents, document_vectors, initial_scores)
print("Reranked Documents:", reranked_documents)

Explanation:

  • The rerank_cosine_similarity function takes the query vector, document identifiers, document vectors, and initial HNSW scores as input.
  • It calculates the cosine similarity between the query vector and each document vector.
  • It combines the cosine similarity score with the initial HNSW score (weighted by 0.1) to produce a reranked score.
  • Finally, it sorts the documents based on the reranked score and returns the sorted list.

Example 2: Using a Pre-trained Sentence Transformer for Reranking

Sentence Transformers provide pre-trained models that can encode sentences and paragraphs into meaningful vector representations. We can use these for reranking.

from sentence_transformers import SentenceTransformer
import numpy as np

# Load a pre-trained Sentence Transformer model
model = SentenceTransformer('all-mpnet-base-v2')

def rerank_sentence_transformer(query, documents, document_vectors, initial_scores):
    """
    Reranks documents using a Sentence Transformer model.

    Args:
        query: The search query.
        documents: A list of document identifiers.
        document_vectors: A list of document vectors (pre-computed).
        initial_scores: Initial scores from HNSW search.

    Returns:
        A list of document identifiers sorted by reranked score.
    """
    query_embedding = model.encode(query)
    reranked_scores = []
    for i in range(len(documents)):
        cosine_similarity = np.dot(query_embedding, document_vectors[i]) / (np.linalg.norm(query_embedding) * np.linalg.norm(document_vectors[i]))
        reranked_score = cosine_similarity + initial_scores[i] * 0.2  # Combine initial score
        reranked_scores.append((documents[i], reranked_score))

    reranked_scores.sort(key=lambda x: x[1], reverse=True)
    return [doc for doc, score in reranked_scores]

# Example Usage
query = "best pizza in New York"
documents = ["doc1", "doc2", "doc3"]
document_vectors = [model.encode("This is a great pizza place."),
                    model.encode("New York has the best pizza."),
                    model.encode("Delicious Italian food in Brooklyn.")]
initial_scores = [0.8, 0.6, 0.9]

reranked_documents = rerank_sentence_transformer(query, documents, document_vectors, initial_scores)
print("Reranked Documents:", reranked_documents)

Explanation:

  • This example uses the all-mpnet-base-v2 Sentence Transformer model. You can explore other models based on your specific needs.
  • The rerank_sentence_transformer function encodes the query using the Sentence Transformer model.
  • It calculates the cosine similarity between the query embedding and each document embedding.
  • It combines the cosine similarity score with the initial HNSW score.

Example 3: Training a Simple Cross-Encoder (Illustrative)

While computationally expensive, cross-encoders offer high accuracy. This example outlines the training process (simplified for illustration). Note: This is not practical for large-scale reranking due to performance limitations.

# Simplified Cross-Encoder Training (Illustrative - Not Production Ready)
import torch
from torch.nn import Linear
from torch.optim import Adam
from torch.nn import functional as F

class CrossEncoder(torch.nn.Module):
    def __init__(self, embedding_dim):
        super(CrossEncoder, self).__init__()
        self.linear = Linear(2 * embedding_dim, 1)

    def forward(self, query_embedding, document_embedding):
        combined = torch.cat((query_embedding, document_embedding), dim=1)
        return torch.sigmoid(self.linear(combined))

# Dummy Training Data
training_data = [
    ("query1", "doc1", 0.9),  # Relevant
    ("query1", "doc2", 0.2),  # Not Relevant
    ("query2", "doc3", 0.7),  # Relevant
    ("query2", "doc4", 0.1),  # Not Relevant
]

# Model and Optimizer
model = CrossEncoder(embedding_dim=768) # Assuming embedding dimension of 768
optimizer = Adam(model.parameters(), lr=0.001)

# Training Loop
for epoch in range(10):
    for query, document, relevance in training_data:
        query_embedding = torch.randn(768)  # Replace with actual embedding
        document_embedding = torch.randn(768) # Replace with actual embedding

        optimizer.zero_grad()
        predicted_relevance = model(query_embedding, document_embedding)
        loss = F.binary_cross_entropy(predicted_relevance, torch.tensor(relevance))
        loss.backward()
        optimizer.step()

print("Training complete")

Explanation:

  • This example defines a simple CrossEncoder model with a linear layer.
  • It uses dummy training data and a binary cross-entropy loss function.
  • Important: In a real-world scenario, you would use actual query and document embeddings and a much larger and more representative training dataset.

Conclusion: Combining HNSW and Reranking

Reranking, when combined with HNSW, provides a powerful solution for improving search relevance. While computationally more intensive than HNSW alone, it allows you to leverage more sophisticated models to refine the initial results and deliver a better user experience. The choice between cross-encoders and re-rankers depends on the trade-off between accuracy and performance. Experimentation and careful evaluation are key to finding the optimal configuration for your specific application. Remember to consider the cost of reranking when designing your search pipeline. The initial HNSW search should narrow the result set to a manageable size before applying the reranking model.


Discover more from A Streak of Communication

Subscribe to get the latest posts sent to your email.

Leave a Reply

Discover more from A Streak of Communication

Subscribe now to keep reading and get access to the full archive.

Continue reading