Your vector similarity search is humming along at 50ms latency. Then at 2 AM, the node holding your product embeddings dies. Suddenly, your recommendation engine returns empty results, and customers see a blank โYou might also likeโ section. Revenue drops by the minute.
In our previous post, we covered consistent hashing for distributing data across shards. But distribution alone doesnโt protect you from node failures. Thatโs where replication entersโand itโs more nuanced than โjust copy everything.โ
Table of Contents
The Problem Replication Solves
A single node holding your vector index is a single point of failure. Hardware fails. Disks corrupt. Network partitions happen. The question isnโt if your node will fail, but when.
Replication creates copies of your data across multiple nodes. When one dies, others continue serving requests. Simple concept, messy implementation.
The tradeoff youโll wrestle with: consistency vs. availability vs. latency. You canโt maximize all three. Every replication strategy picks its poison.
Core Concept 1: Primary-Replica Architecture
What it is: One node (primary) accepts all writes. Other nodes (replicas) copy data from the primary and serve read requests.
The Problem It Solves: You need redundancy, but allowing writes everywhere creates conflicts. Who wins when two nodes update the same vector simultaneously? Primary-replica sidesteps this by funneling all writes through one authoritative source.
Analogy: Think of a production database migration gone wrong. The primary is your source-of-truth database; replicas are read-only copies you can query without hammering the primary during peak load.
Simple Example:
import time
from dataclasses import dataclass, field
from typing import Optional
from collections import deque
@dataclass
class VectorRecord:
id: str
embedding: list[float]
timestamp: float = field(default_factory=time.time)
class ReplicaNode:
def __init__(self, node_id: str, is_primary: bool = False):
self.node_id = node_id
self.is_primary = is_primary
self.vectors: dict[str, VectorRecord] = {}
self.write_log: deque = deque(maxlen=10000) # Replication log
self.last_applied_offset: int = 0
def write(self, record: VectorRecord) -> bool:
"""Only primary accepts writes directly."""
if not self.is_primary:
raise PermissionError(f"Node {self.node_id} is replica. Writes rejected.")
self.vectors[record.id] = record
self.write_log.append(record)
print(f"[PRIMARY {self.node_id}] Wrote vector {record.id}")
return True
def read(self, vector_id: str) -> Optional[VectorRecord]:
"""Any node can serve reads."""
return self.vectors.get(vector_id)
def get_replication_log(self, from_offset: int) -> list[VectorRecord]:
"""Returns log entries for replicas to fetch."""
return list(self.write_log)[from_offset:]
# Setup: 1 primary, 2 replicas
primary = ReplicaNode("node-0", is_primary=True)
replica_1 = ReplicaNode("node-1", is_primary=False)
replica_2 = ReplicaNode("node-2", is_primary=False)
# Write to primary
primary.write(VectorRecord(id="product-001", embedding=[0.1, 0.2, 0.3, 0.4]))
primary.write(VectorRecord(id="product-002", embedding=[0.5, 0.6, 0.7, 0.8]))
# Replica attempting write fails
try:
replica_1.write(VectorRecord(id="product-003", embedding=[0.9, 0.8, 0.7, 0.6]))
except PermissionError as e:
print(f"[ERROR] {e}")
Output:
[PRIMARY node-0] Wrote vector product-001
[PRIMARY node-0] Wrote vector product-002
[ERROR] Node node-1 is replica. Writes rejected.
Where It Breaks: All writes funnel through one node. If your write volume exceeds what a single node can handle, youโre stuck. Sharding helps here, but then you need replication per shardโcomplexity compounds fast.
Core Concept 2: Synchronous vs. Asynchronous Replication
What it is: Synchronous replication waits for replicas to confirm before acknowledging writes. Asynchronous confirms immediately and replicates in the background.
The Problem It Solves: Youโre choosing between data safety and speed. Lose a primary before async replication completes? Those writes are gone.
Analogy: Itโs like the difference between a database transaction that commits only after the backup confirms (sync), versus fire-and-forget logging that might lose the last few entries during a crash (async).
Synchronous Replication Example:
import time
from concurrent.futures import ThreadPoolExecutor, TimeoutError
from typing import Callable
class SyncReplicatedNode:
def __init__(self, node_id: str):
self.node_id = node_id
self.vectors: dict[str, list[float]] = {}
self.replicas: list['SyncReplicatedNode'] = []
def add_replica(self, replica: 'SyncReplicatedNode'):
self.replicas.append(replica)
def write_sync(self, vector_id: str, embedding: list[float], timeout: float = 2.0) -> bool:
"""
Synchronous write: blocks until ALL replicas confirm.
Tradeoff: Latency equals slowest replica + network round-trip.
"""
# Write locally first
self.vectors[vector_id] = embedding
if not self.replicas:
return True
# Replicate to all replicas, wait for confirmation
def replicate_to(replica: 'SyncReplicatedNode') -> bool:
try:
# Simulate network latency (real systems: gRPC/HTTP call)
time.sleep(0.05) # 50ms network latency
replica.vectors[vector_id] = embedding
return True
except Exception:
return False
start = time.time()
with ThreadPoolExecutor(max_workers=len(self.replicas)) as executor:
futures = [executor.submit(replicate_to, r) for r in self.replicas]
try:
results = [f.result(timeout=timeout) for f in futures]
except TimeoutError:
print(f"[WARN] Replication timeout after {timeout}s")
return False
elapsed_ms = (time.time() - start) * 1000
success = all(results)
print(f"[SYNC] Wrote {vector_id} to {len(self.replicas)} replicas in {elapsed_ms:.1f}ms")
return success
# Setup
primary = SyncReplicatedNode("primary")
replica_a = SyncReplicatedNode("replica-a")
replica_b = SyncReplicatedNode("replica-b")
primary.add_replica(replica_a)
primary.add_replica(replica_b)
# Synchronous write - blocks until both replicas confirm
primary.write_sync("vec-001", [0.1, 0.2, 0.3, 0.4])
primary.write_sync("vec-002", [0.5, 0.6, 0.7, 0.8])
print(f"\nPrimary vectors: {list(primary.vectors.keys())}")
print(f"Replica-A vectors: {list(replica_a.vectors.keys())}")
print(f"Replica-B vectors: {list(replica_b.vectors.keys())}")
Output:
[SYNC] Wrote vec-001 to 2 replicas in 52.3ms
[SYNC] Wrote vec-002 to 2 replicas in 51.8ms
Primary vectors: ['vec-001', 'vec-002']
Replica-A vectors: ['vec-001', 'vec-002']
Replica-B vectors: ['vec-001', 'vec-002']
Asynchronous Replication Example:
import threading
import queue
import time
class AsyncReplicatedNode:
def __init__(self, node_id: str):
self.node_id = node_id
self.vectors: dict[str, list[float]] = {}
self.replicas: list['AsyncReplicatedNode'] = []
self.replication_queue: queue.Queue = queue.Queue()
self._start_replication_worker()
def _start_replication_worker(self):
"""Background thread that processes replication queue."""
def worker():
while True:
try:
vector_id, embedding = self.replication_queue.get(timeout=1.0)
for replica in self.replicas:
time.sleep(0.05) # Simulate network latency
replica.vectors[vector_id] = embedding
print(f"[ASYNC WORKER] Replicated {vector_id} to {len(self.replicas)} replicas")
self.replication_queue.task_done()
except queue.Empty:
continue
thread = threading.Thread(target=worker, daemon=True)
thread.start()
def add_replica(self, replica: 'AsyncReplicatedNode'):
self.replicas.append(replica)
def write_async(self, vector_id: str, embedding: list[float]) -> bool:
"""
Async write: returns immediately after local write.
Replication happens in background. DANGER: data loss possible!
"""
start = time.time()
self.vectors[vector_id] = embedding
self.replication_queue.put((vector_id, embedding))
elapsed_ms = (time.time() - start) * 1000
print(f"[ASYNC] Wrote {vector_id} locally in {elapsed_ms:.2f}ms (replication queued)")
return True
# Setup
primary = AsyncReplicatedNode("primary")
replica_a = AsyncReplicatedNode("replica-a")
replica_b = AsyncReplicatedNode("replica-b")
primary.add_replica(replica_a)
primary.add_replica(replica_b)
# Async writes return immediately
primary.write_async("vec-001", [0.1, 0.2, 0.3, 0.4])
primary.write_async("vec-002", [0.5, 0.6, 0.7, 0.8])
primary.write_async("vec-003", [0.9, 0.8, 0.7, 0.6])
# Check immediately - replicas may not have data yet!
print(f"\n[IMMEDIATE CHECK]")
print(f"Primary vectors: {list(primary.vectors.keys())}")
print(f"Replica-A vectors: {list(replica_a.vectors.keys())}") # Likely empty!
# Wait for replication to complete
time.sleep(0.5)
print(f"\n[AFTER 500ms]")
print(f"Replica-A vectors: {list(replica_a.vectors.keys())}")
print(f"Replica-B vectors: {list(replica_b.vectors.keys())}")
Output:
[ASYNC] Wrote vec-001 locally in 0.02ms (replication queued)
[ASYNC] Wrote vec-002 locally in 0.01ms (replication queued)
[ASYNC] Wrote vec-003 locally in 0.01ms (replication queued)
[IMMEDIATE CHECK]
Primary vectors: ['vec-001', 'vec-002', 'vec-003']
Replica-A vectors: []
[ASYNC WORKER] Replicated vec-001 to 2 replicas
[ASYNC WORKER] Replicated vec-002 to 2 replicas
[ASYNC WORKER] Replicated vec-003 to 2 replicas
[AFTER 500ms]
Replica-A vectors: ['vec-001', 'vec-002', 'vec-003']
Replica-B vectors: ['vec-001', 'vec-002', 'vec-003']
Where It Breaks:
-
Sync: One slow replica tanks your entire write latency. A replica in another region with 200ms RTT means every write takes 200ms minimum. Weโve seen teams add a third replica for redundancy, only to watch p99 latency triple because that replica was in a different availability zone.
-
Async: Read-after-write inconsistency. User updates their profile embedding, immediately searches, gets stale results because the replica hasnโt caught up. They refresh, works fine. Intermittent bugs that are maddening to debug.
Core Concept 3: Quorum-Based Replication
What it is: Instead of waiting for all replicas (sync) or none (async), wait for a majority. With 3 replicas, wait for 2 confirmations.
The Problem It Solves: Pure sync is too slow. Pure async loses data. Quorum finds the middle groundโtolerate some replica failures without blocking writes.
The Math: For N replicas with quorum size Q:
- Write quorum (W): Number of replicas that must confirm a write
- Read quorum ( R ) : Number of replicas to query for reads
- Consistency guarantee: If W + R > N, youโre guaranteed to read latest write
import time
import random
from concurrent.futures import ThreadPoolExecutor, as_completed
from dataclasses import dataclass
@dataclass
class QuorumConfig:
total_replicas: int
write_quorum: int
read_quorum: int
def __post_init__(self):
# Validate quorum overlap for consistency
if self.write_quorum + self.read_quorum <= self.total_replicas:
print(f"[WARN] W({self.write_quorum}) + R({self.read_quorum}) <= N({self.total_replicas})")
print(" Reads may return stale data!")
class QuorumReplicatedStore:
def __init__(self, config: QuorumConfig):
self.config = config
self.replicas: list[dict[str, tuple[list[float], int]]] = [
{} for _ in range(config.total_replicas)
]
self.version_counter = 0
def _simulate_replica_latency(self, replica_idx: int) -> float:
"""Simulate variable network latency per replica."""
base_latency = 0.02 # 20ms base
jitter = random.uniform(0, 0.08) # 0-80ms jitter
# Replica 2 is in a far region - always slow
if replica_idx == 2:
base_latency = 0.15 # 150ms for distant replica
return base_latency + jitter
def write(self, vector_id: str, embedding: list[float]) -> bool:
"""
Quorum write: succeeds when write_quorum replicas confirm.
"""
self.version_counter += 1
version = self.version_counter
def write_to_replica(idx: int) -> tuple[int, bool, float]:
latency = self._simulate_replica_latency(idx)
time.sleep(latency)
self.replicas[idx][vector_id] = (embedding, version)
return (idx, True, latency * 1000)
start = time.time()
successful_writes = 0
with ThreadPoolExecutor(max_workers=self.config.total_replicas) as executor:
futures = [executor.submit(write_to_replica, i)
for i in range(self.config.total_replicas)]
for future in as_completed(futures):
idx, success, latency_ms = future.result()
if success:
successful_writes += 1
print(f" Replica {idx} confirmed in {latency_ms:.1f}ms")
# Quorum reached! Don't wait for slow replicas
if successful_writes >= self.config.write_quorum:
elapsed_ms = (time.time() - start) * 1000
print(f"[QUORUM] Write succeeded with {successful_writes}/{self.config.total_replicas} "
f"replicas in {elapsed_ms:.1f}ms")
return True
return False
def read(self, vector_id: str) -> tuple[list[float], int] | None:
"""
Quorum read: query read_quorum replicas, return highest version.
"""
def read_from_replica(idx: int):
latency = self._simulate_replica_latency(idx)
time.sleep(latency)
return (idx, self.replicas[idx].get(vector_id))
results = []
with ThreadPoolExecutor(max_workers=self.config.total_replicas) as executor:
futures = [executor.submit(read_from_replica, i)
for i in range(self.config.total_replicas)]
for future in as_completed(futures):
idx, data = future.result()
if data:
results.append(data)
if len(results) >= self.config.read_quorum:
# Return highest version
latest = max(results, key=lambda x: x[1])
return latest
return None
# N=3, W=2, R=2 -> Consistent (2+2 > 3)
config = QuorumConfig(total_replicas=3, write_quorum=2, read_quorum=2)
store = QuorumReplicatedStore(config)
print("Writing vector-001...")
store.write("vector-001", [0.1, 0.2, 0.3, 0.4])
print("\nWriting vector-002...")
store.write("vector-002", [0.5, 0.6, 0.7, 0.8])
Output:
Writing vector-001...
Replica 0 confirmed in 45.2ms
Replica 1 confirmed in 67.8ms
[QUORUM] Write succeeded with 2/3 replicas in 68.1ms
Writing vector-002...
Replica 1 confirmed in 38.4ms
Replica 0 confirmed in 52.1ms
[QUORUM] Write succeeded with 2/3 replicas in 52.4ms
Notice: Replica 2 (the slow one at 150ms+) never blocked the write. Quorum lets fast replicas drive performance while slow ones catch up eventually.
Where It Breaks: Quorum assumes replicas are roughly equal. If you have 5 replicas but 3 are in one datacenter and 2 in another, a datacenter outage might leave you unable to reach quorumโeven though nodes are technically โup.โ
Core Concept 4: Failover and Leader Election
What it is: When the primary dies, replicas must elect a new leader to accept writes.
The Problem It Solves: Without automatic failover, a primary failure means manual intervention at 3 AM. With failover, the system self-heals in seconds.
import time
import random
import threading
from enum import Enum
from typing import Optional
class NodeState(Enum):
LEADER = "leader"
FOLLOWER = "follower"
CANDIDATE = "candidate"
DEAD = "dead"
class FailoverNode:
def __init__(self, node_id: str, peers: list['FailoverNode'] = None):
self.node_id = node_id
self.state = NodeState.FOLLOWER
self.peers = peers or []
self.current_term = 0
self.voted_for: Optional[str] = None
self.last_heartbeat = time.time()
self.election_timeout = random.uniform(0.15, 0.3) # 150-300ms
self._lock = threading.Lock()
def receive_heartbeat(self, from_leader: str, term: int):
"""Follower receives heartbeat from leader."""
with self._lock:
if term >= self.current_term:
self.current_term = term
self.state = NodeState.FOLLOWER
self.last_heartbeat = time.time()
self.voted_for = None
def request_vote(self, candidate_id: str, term: int) -> bool:
"""Candidate requests vote from this node."""
with self._lock:
if term > self.current_term:
self.current_term = term
self.voted_for = None
if self.voted_for is None or self.voted_for == candidate_id:
self.voted_for = candidate_id
return True
return False
def start_election(self):
"""Initiate leader election."""
with self._lock:
self.state = NodeState.CANDIDATE
self.current_term += 1
self.voted_for = self.node_id
term = self.current_term
print(f"[{self.node_id}] Starting election for term {term}")
votes = 1 # Vote for self
for peer in self.peers:
if peer.state != NodeState.DEAD:
if peer.request_vote(self.node_id, term):
votes += 1
print(f"[{self.node_id}] Received vote from {peer.node_id}")
# Need majority
if votes > (len(self.peers) + 1) / 2:
with self._lock:
self.state = NodeState.LEADER
print(f"[{self.node_id}] Elected leader with {votes} votes!")
return True
print(f"[{self.node_id}] Election failed. Got {votes} votes, needed {(len(self.peers) + 1) // 2 + 1}")
return False
def check_leader_timeout(self) -> bool:
"""Check if leader heartbeat timed out."""
if self.state == NodeState.LEADER or self.state == NodeState.DEAD:
return False
return (time.time() - self.last_heartbeat) > self.election_timeout
def kill(self):
"""Simulate node failure."""
self.state = NodeState.DEAD
print(f"[{self.node_id}] NODE DIED")
# Setup cluster
node_a = FailoverNode("node-A")
node_b = FailoverNode("node-B")
node_c = FailoverNode("node-C")
# Connect peers
node_a.peers = [node_b, node_c]
node_b.peers = [node_a, node_c]
node_c.peers = [node_a, node_b]
# Initial election - node_a becomes leader
print("=== Initial Election ===")
node_a.state = NodeState.LEADER
node_a.current_term = 1
print(f"[node-A] Started as leader (term 1)")
# Simulate heartbeats
for peer in node_a.peers:
peer.receive_heartbeat("node-A", 1)
print(f"\nCluster state:")
for node in [node_a, node_b, node_c]:
print(f" {node.node_id}: {node.state.value}")
# Kill the leader!
print("\n=== Leader Failure ===")
node_a.kill()
# Simulate timeout detection
time.sleep(0.2)
print("\n=== Follower Detects Timeout ===")
# node_b detects leader is dead and starts election
if node_b.check_leader_timeout():
print(f"[node-B] Leader timeout detected!")
node_b.start_election()
print(f"\nNew cluster state:")
for node in [node_a, node_b, node_c]:
print(f" {node.node_id}: {node.state.value}")
Output:
=== Initial Election ===
[node-A] Started as leader (term 1)
Cluster state:
node-A: leader
node-B: follower
node-C: follower
=== Leader Failure ===
[node-A] NODE DIED
=== Follower Detects Timeout ===
[node-B] Leader timeout detected!
[node-B] Starting election for term 2
[node-B] Received vote from node-C
[node-B] Elected leader with 2 votes!
New cluster state:
node-A: dead
node-B: leader
node-C: follower
Where It Breaks: Split-brain. If network partitions the cluster into two groups, each group might elect its own leader. Now you have two primaries accepting conflicting writes. This is why production systems use odd numbers of replicas and carefully designed network topologies.
Practical Walkthrough: Building a Replicated Vector Store
Letโs combine everything into a minimal but functional replicated vector store:
import time
import threading
from typing import Optional
from dataclasses import dataclass
from concurrent.futures import ThreadPoolExecutor, as_completed
@dataclass
class Vector:
id: str
embedding: list[float]
version: int
class ReplicatedVectorStore:
"""
A simple replicated vector store demonstrating:
- Primary-replica architecture
- Configurable sync/async replication
- Basic failover support
"""
def __init__(self, node_id: str, is_primary: bool = False):
self.node_id = node_id
self.is_primary = is_primary
self.vectors: dict[str, Vector] = {}
self.replicas: list['ReplicatedVectorStore'] = []
self.version = 0
self._lock = threading.Lock()
# Replication settings
self.replication_mode = "sync" # "sync" or "async"
self.replication_queue = []
def add_replica(self, replica: 'ReplicatedVectorStore'):
self.replicas.append(replica)
def put(self, vector_id: str, embedding: list[float]) -> dict:
"""Store a vector with replication."""
if not self.is_primary:
return {"error": "Write rejected: not primary", "redirect_to": "primary"}
with self._lock:
self.version += 1
vec = Vector(id=vector_id, embedding=embedding, version=self.version)
self.vectors[vector_id] = vec
# Replicate based on mode
start = time.time()
if self.replication_mode == "sync":
success_count = self._replicate_sync(vec)
elapsed = (time.time() - start) * 1000
return {
"status": "ok",
"version": vec.version,
"replicated_to": success_count,
"latency_ms": round(elapsed, 2)
}
else:
self._replicate_async(vec)
elapsed = (time.time() - start) * 1000
return {
"status": "ok",
"version": vec.version,
"replication": "pending",
"latency_ms": round(elapsed, 2)
}
def _replicate_sync(self, vec: Vector) -> int:
"""Synchronous replication - wait for all replicas."""
def send_to_replica(replica):
time.sleep(0.03) # Simulate 30ms network
with replica._lock:
replica.vectors[vec.id] = vec
return True
success = 0
with ThreadPoolExecutor(max_workers=len(self.replicas)) as executor:
futures = {executor.submit(send_to_replica, r): r for r in self.replicas}
for future in as_completed(futures):
if future.result():
success += 1
return success
def _replicate_async(self, vec: Vector):
"""Async replication - fire and forget."""
def background_replicate():
for replica in self.replicas:
time.sleep(0.03)
with replica._lock:
replica.vectors[vec.id] = vec
thread = threading.Thread(target=background_replicate, daemon=True)
thread.start()
def get(self, vector_id: str) -> Optional[Vector]:
"""Retrieve a vector."""
return self.vectors.get(vector_id)
def search(self, query_embedding: list[float], top_k: int = 5) -> list[tuple[str, float]]:
"""
Simple brute-force similarity search.
In production, you'd use HNSW or IVF indexes.
"""
def cosine_sim(a: list[float], b: list[float]) -> float:
dot = sum(x * y for x, y in zip(a, b))
norm_a = sum(x * x for x in a) ** 0.5
norm_b = sum(x * x for x in b) ** 0.5
return dot / (norm_a * norm_b + 1e-10)
scores = []
for vec in self.vectors.values():
sim = cosine_sim(query_embedding, vec.embedding)
scores.append((vec.id, sim))
scores.sort(key=lambda x: x[1], reverse=True)
return scores[:top_k]
def promote_to_primary(self):
"""Emergency promotion during failover."""
self.is_primary = True
print(f"[{self.node_id}] PROMOTED TO PRIMARY")
# Demo: Setup and test
print("=== Setting up replicated cluster ===")
primary = ReplicatedVectorStore("primary", is_primary=True)
replica1 = ReplicatedVectorStore("replica-1")
replica2 = ReplicatedVectorStore("replica-2")
primary.add_replica(replica1)
primary.add_replica(replica2)
# Test sync replication
print("\n=== Synchronous Writes ===")
primary.replication_mode = "sync"
result = primary.put("doc-001", [0.1, 0.2, 0.3, 0.4])
print(f"Write result: {result}")
result = primary.put("doc-002", [0.5, 0.6, 0.7, 0.8])
print(f"Write result: {result}")
# Verify replication
print(f"\nPrimary has: {list(primary.vectors.keys())}")
print(f"Replica-1 has: {list(replica1.vectors.keys())}")
print(f"Replica-2 has: {list(replica2.vectors.keys())}")
# Test async replication
print("\n=== Async Writes ===")
primary.replication_mode = "async"
result = primary.put("doc-003", [0.9, 0.8, 0.7, 0.6])
print(f"Write result: {result}")
# Check immediately - async may not be done
print(f"Replica-1 has (immediate): {list(replica1.vectors.keys())}")
time.sleep(0.1)
print(f"Replica-1 has (after 100ms): {list(replica1.vectors.keys())}")
# Test search (any node can serve reads)
print("\n=== Search on Replica ===")
query = [0.15, 0.25, 0.35, 0.45]
results = replica1.search(query, top_k=3)
print(f"Top results: {results}")
# Simulate failover
print("\n=== Failover Scenario ===")
print("Primary dies! Promoting replica-1...")
primary.is_primary = False # Simulated death
replica1.promote_to_primary()
# New primary accepts writes
result = replica1.put("doc-004", [0.2, 0.3, 0.4, 0.5])
print(f"New primary write: {result}")
Advanced Tips & Production Reality
Replication lag monitoring is non-negotiable. Weโve seen systems where async replicas fell hours behind during traffic spikes. Reads were serving data from yesterday. Add metrics: replication_lag_seconds, replication_queue_depth.
Geographic replication multiplies complexity. Cross-region replication means 100-300ms per round-trip. Sync replication becomes unusable. Most teams use async with conflict resolution, but conflict resolution for vector embeddings isโฆ interesting. Which embedding โwinsโ when the same document is re-embedded differently in two regions?
The learned-the-hard-way moment: We once had a 3-node cluster with quorum writes (W=2). Deployed a bad config that made one replica reject all writes. System still โworkedโ because 2 nodes hit quorum. A week later, that node came back online with stale data, got elected leader during a restart, and served week-old embeddings until someone noticed search quality tanked.
Actionable Takeaways
- Start with async replication unless you have strict consistency requirementsโsync kills latency
- Use quorum (W=2, R=2, N=3) as your defaultโbalances consistency and availability
- Monitor replication lag obsessivelyโstale reads are silent failures
- Test failover regularlyโdonโt discover itโs broken at 3 AM
- Odd number of replicas prevents split-brain during leader election
- Geographic replication requires asyncโaccept eventual consistency or accept latency
| Replication Mode | Latency | Consistency | Data Safety |
|---|---|---|---|
| Synchronous | High (slowest replica) | Strong | High |
| Asynchronous | Low (local write only) | Eventual | Risk of loss |
| Quorum (W=2, N=3) | Medium (2nd fastest) | Tunable | Good |
Conclusion
Replication isnโt just โcopy data to more nodes.โ Itโs a series of tradeoffs between consistency, availability, and latency. Synchronous gives you safety but kills performance. Asynchronous gives you speed but risks data loss. Quorum finds middle ground but adds complexity.
The right choice depends on your pain tolerance. Can you afford stale reads? Can you afford slow writes? Can you afford 3 AM pages when failover breaks?
In the next post, weโll tackle distributed indexingโhow to make similarity search fast across a replicated, sharded cluster. Thatโs where things get genuinely hard.
Discover more from A Streak of Communication
Subscribe to get the latest posts sent to your email.