PostgreSQL Sharding: When to Scale Horizontally and How to Do It Right

PostgreSQL is excellent—until it isn't. At some point, your single database hits a wall. Here's how to know when you need sharding, and how to do it right.

Sharding is a double-edged sword: it solves single-node scaling limits but introduces cross-shard consistency, query complexity, and operational overhead.

Signs You Need Sharding

Your situation calls for sharding if:

Single table > 100GB and hot (write-heavy)
> 1 billion rows in a single table
Vertical scaling exhausted: Biggest instance still can't keep up

Before You Shard, Exhaust These Alternatives

Alternative	When It Works
Better indexing	80% of performance issues are missing indexes
Read replicas	100K QPS reads → 1 primary + 5 replicas
Caching	Redis in front—90% requests from cache
Table partitioning	Time-series data by month

PostgreSQL Partitioning: The First Step

Three Strategies

-- HASH PARTITIONING: Distribute by hash
CREATE TABLE users (
    user_id BIGSERIAL,
    name TEXT
) PARTITION BY HASH (user_id);

-- RANGE PARTITIONING: Distribute by range (time-series)
CREATE TABLE events (
    id BIGSERIAL,
    created_at TIMESTAMP NOT NULL
) PARTITION BY RANGE (created_at);

-- LIST PARTITIONING: Distribute by discrete values
CREATE TABLE orders (
    order_id BIGSERIAL,
    country TEXT
) PARTITION BY LIST (country);

When to Use Each

Strategy	Best For	Query Pattern
Hash	Even distribution, random access	Single-shard lookups
Range	Time-series, archival	Date-range queries
List	Categorical data	IN queries by category

Citus: PostgreSQL Extension for Horizontal Scaling

Citus transforms a single PostgreSQL into a distributed system—transparently.

Setup

CREATE EXTENSION citus;

SELECT citus_add_node('worker1.example.com', 5432);
SELECT citus_add_node('worker2.example.com', 5432);

Creating Distributed Tables

CREATE TABLE events (
    tenant_id BIGINT NOT NULL,
    event_id BIGINT NOT NULL,
    event_data JSONB
);

SELECT create_distributed_table('events', 'tenant_id');

The Cross-Shard Query Problem

This is where sharding gets painful.

Fast: Single-Shard Query

-- Goes to ONE shard (fast, ~10ms)
SELECT * FROM orders WHERE user_id = 42;

Slow: Cross-Shard Query

-- Fans out to ALL shards (slow, ~500ms)
SELECT COUNT(*) FROM orders GROUP BY country;

Design queries to include the distribution key.

TL;DR

Partitioning: Start with PostgreSQL native (range, list, hash)
Citus: Use when single-node limits hit
Cross-shard pain: Always design for single-shard queries
Alternatives first: Indexes, replicas, caching

Ready to scale? Start with partitioning, move to Citus, only then consider application-level sharding.