PostgreSQL 18 Streaming Replication Setup
Overview
Section titled “Overview”This guide provides step-by-step instructions for setting up PostgreSQL 18.1 Streaming Replication on Debian/PGDG using a Primary-Standby architecture.
Architecture:
- Primary Server:
10.0.0.90(receives all writes) - Standby Servers:
10.0.0.91,10.0.0.92(read replicas) - Replication Type: Physical Streaming Replication (Standby can read / receives continuous WAL)
Prerequisites & Checklist
Section titled “Prerequisites & Checklist”Before starting, ensure the following conditions are met:
- PostgreSQL Major version must be identical (18.x) on all servers
- Network connectivity: Standby (91/92) must be able to reach Primary (90) on port
5432 - Running as Debian/PGDG cluster (has
pg_lsclusters) or at least know thedata_directorypath - You have sudo/root access on all servers
Verify Critical File Paths (All Servers)
Section titled “Verify Critical File Paths (All Servers)”Run these commands on all servers to verify paths:
sudo -u postgres psql -c "SHOW data_directory; SHOW config_file; SHOW hba_file;"pg_lsclustersExpected Output:
data_directory | config_file | hba_file-------------------------------------------+---------------------------------------+-------------------------------------- /var/lib/postgresql/18/main | /etc/postgresql/18/main/postgresql.conf | /etc/postgresql/18/main/pg_hba.confStep 1: Primary Server Setup (10.0.0.90)
Section titled “Step 1: Primary Server Setup (10.0.0.90)”1.1 Configure postgresql.conf
Section titled “1.1 Configure postgresql.conf”Edit the PostgreSQL configuration file (typically /etc/postgresql/18/main/postgresql.conf):
sudo nano /etc/postgresql/18/main/postgresql.confAdd or modify the following parameters:
# Network Configurationlisten_addresses = '*'
# Replication Configurationwal_level = replicamax_wal_senders = 10max_replication_slots = 10
# Prevent WAL from growing indefinitely when slot/standby has issues# Adjust based on actual disk space (e.g., 5-50GB)max_slot_wal_keep_size = '10GB'Parameter Explanations:
| Parameter | Value | Purpose |
|---|---|---|
listen_addresses | '*' | Allow connections from all network interfaces |
wal_level | replica | Enable WAL logging for replication |
max_wal_senders | 10 | Maximum concurrent WAL sender processes |
max_replication_slots | 10 | Maximum replication slots allowed |
max_slot_wal_keep_size | '10GB' | Prevent WAL files from filling disk |
1.2 Create Replication User
Section titled “1.2 Create Replication User”Create a dedicated PostgreSQL user with replication privileges:
sudo -u postgres psql -c "CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'ChangeMeStrongPassword';"1.3 Configure Client Authentication (pg_hba.conf)
Section titled “1.3 Configure Client Authentication (pg_hba.conf)”Edit the HBA configuration file (typically /etc/postgresql/18/main/pg_hba.conf):
sudo nano /etc/postgresql/18/main/pg_hba.confAdd the following lines at the end of the file:
# Allow replication from standbyshost replication replicator 10.0.0.91/32 scram-sha-256host replication replicator 10.0.0.92/32 scram-sha-256HBA Rule Breakdown:
| Column | Value | Meaning |
|---|---|---|
host | TCP/IP connection | Connection type |
replication | special database | Replication connection |
replicator | username | User allowed to connect |
10.0.0.91/32 | IP address | Standby server IP |
scram-sha-256 | auth method | Secure password authentication |
1.4 Restart PostgreSQL on Primary
Section titled “1.4 Restart PostgreSQL on Primary”Apply the configuration changes by restarting PostgreSQL:
# Using systemctlsudo systemctl restart postgresql@18-main
# OR using pg_ctlclustersudo pg_ctlcluster 18 main restartVerify Service Status:
sudo systemctl status postgresql@18-mainStep 2: Standby Server Setup (10.0.0.91 and 10.0.0.92)
Section titled “Step 2: Standby Server Setup (10.0.0.91 and 10.0.0.92)”2.1 Stop PostgreSQL
Section titled “2.1 Stop PostgreSQL”Stop the PostgreSQL service on the Standby server:
# Using systemctlsudo systemctl stop postgresql@18-main
# OR using pg_ctlclustersudo pg_ctlcluster 18 main stop2.2 Wipe Data Directory
Section titled “2.2 Wipe Data Directory”Verify the data directory path:
pg_lsclustersExample output:
Ver Cluster Port Status Owner Data directory Log file18 main 5432 down postgres /var/lib/postgresql/18/main /var/log/postgresql/...Remove all files from the data directory:
sudo -u postgres bash -c "rm -rf /var/lib/postgresql/18/main/*"2.3 Configure Password File (Recommended)
Section titled “2.3 Configure Password File (Recommended)”To avoid entering passwords during pg_basebackup, create a .pgpass file:
sudo -u postgres bash -c 'echo "10.0.0.90:5432:*:replicator:ChangeMeStrongPassword" >> ~/.pgpass && chmod 600 ~/.pgpass'Verify .pgpass was created:
sudo -u postgres cat ~/.pgpass.pgpass Format:
hostname:port:database:username:password2.4 Perform Base Backup with Replication Slot
Section titled “2.4 Perform Base Backup with Replication Slot”Run pg_basebackup to clone the Primary database and create a dedicated replication slot.
On Standby Server 10.0.0.91:
Section titled “On Standby Server 10.0.0.91:”sudo -u postgres pg_basebackup \ -h 10.0.0.90 \ -p 5432 \ -U replicator \ -D /var/lib/postgresql/18/main \ -Fp -Xs -P -R \ -C -S replica_10_0_0_91On Standby Server 10.0.0.92:
Section titled “On Standby Server 10.0.0.92:”sudo -u postgres pg_basebackup \ -h 10.0.0.90 \ -p 5432 \ -U replicator \ -D /var/lib/postgresql/18/main \ -Fp -Xs -P -R \ -C -S replica_10_0_0_92Option Explanations:
| Option | Purpose |
|---|---|
-h 10.0.0.90 | Primary server hostname/IP |
-p 5432 | PostgreSQL port |
-U replicator | Replication user |
-D /var/lib/postgresql/18/main | Target data directory |
-Fp | Plain format (file-based copy) |
-Xs | Stream WAL during backup |
-P | Show progress |
-R | Create standby.signal and primary_conninfo automatically |
-C | Create replication slot |
-S replica_10_0_0_91 | Replication slot name (unique per standby) |
Expected Output:
pg_basebackup: initiating base backup, waiting for checkpoint to completepg_basebackup: checkpoint completedpg_basebackup: write-ahead log start point: 0/2000028 on timeline 1pg_basebackup: starting background WAL receiverpg_basebackup: created replication slot "replica_10_0_0_91"...pg_basebackup: base backup completed2.5 Start PostgreSQL on Standby
Section titled “2.5 Start PostgreSQL on Standby”Start the PostgreSQL service on the Standby server:
# Using systemctlsudo systemctl start postgresql@18-main
# OR using pg_ctlclustersudo pg_ctlcluster 18 main startVerify Service Started:
sudo systemctl status postgresql@18-mainStep 3: Verify Replication
Section titled “Step 3: Verify Replication”3.1 Verify Standby is in Recovery Mode (91/92)
Section titled “3.1 Verify Standby is in Recovery Mode (91/92)”On each Standby server, check if it’s running in recovery/standby mode:
sudo -u postgres psql -c "SELECT pg_is_in_recovery();"Expected Output:
pg_is_in_recovery------------------- t(1 row)t(true) = Server is in standby/recovery mode ✅f(false) = Server is in normal mode (not a standby) ❌
3.2 Check Replication Status on Primary (90)
Section titled “3.2 Check Replication Status on Primary (90)”On the Primary server, verify that Standby servers are connected:
sudo -u postgres psql -c "SELECT client_addr, state, sync_state, write_lag, flush_lag, replay_lag FROM pg_stat_replication;"Expected Output:
client_addr | state | sync_state | write_lag | flush_lag | replay_lag-------------+-----------+------------+-----------+-----------+------------ 10.0.0.91 | streaming | async | 00:00:00 | 00:00:00 | 00:00:00 10.0.0.92 | streaming | async | 00:00:00 | 00:00:00 | 00:00:00(2 rows)Column Meanings:
| Column | Value | Meaning |
|---|---|---|
client_addr | 10.0.0.91 | IP of connected Standby |
state | streaming | Replication is active |
sync_state | async | Asynchronous replication |
write_lag | 00:00:00 | Time lag for writes |
flush_lag | 00:00:00 | Time lag for flushes |
replay_lag | 00:00:00 | Time lag for replays |
3.3 Verify Replication Slots on Primary (90)
Section titled “3.3 Verify Replication Slots on Primary (90)”Check that replication slots are active:
sudo -u postgres psql -c "SELECT slot_name, active, restart_lsn FROM pg_replication_slots;"Expected Output:
slot_name | active | restart_lsn--------------------+--------+------------- replica_10_0_0_91 | t | 0/3000000 replica_10_0_0_92 | t | 0/3000000(2 rows)active = tmeans the Standby is currently using the slot ✅active = fmeans the Standby is disconnected (WAL may accumulate) ⚠️
Step 4: Operations & Maintenance
Section titled “Step 4: Operations & Maintenance”4.1 Prevent WAL Disk Space Issues
Section titled “4.1 Prevent WAL Disk Space Issues”Problem: If a Standby server goes offline, WAL files can accumulate indefinitely on the Primary, filling up disk space.
Solution 1: Set max_slot_wal_keep_size (already configured in Step 1.1)
max_slot_wal_keep_size = '10GB'Solution 2: Drop inactive replication slots
If a Standby is permanently dead or decommissioned:
-- On Primary serverSELECT pg_drop_replication_slot('replica_10_0_0_91');4.2 Troubleshooting: Replication Not Working
Section titled “4.2 Troubleshooting: Replication Not Working”If replication stops or doesn’t start, check the following:
Network & Connectivity
Section titled “Network & Connectivity”# From Standby, test connection to Primarytelnet 10.0.0.90 5432
# ORnc -zv 10.0.0.90 5432Primary Configuration
Section titled “Primary Configuration”- Ensure
listen_addresses = '*'inpostgresql.conf - Verify
pg_hba.confhas replication rules for Standby IPs - Restart PostgreSQL after config changes
Check Logs
Section titled “Check Logs”On Primary:
sudo journalctl -u postgresql@18-main -n 200 --no-pagerOn Standby:
sudo journalctl -u postgresql@18-main -n 200 --no-pagerCommon Error Messages:
| Error | Cause | Solution |
|---|---|---|
FATAL: no pg_hba.conf entry | Missing HBA rule | Add host replication replicator ... to pg_hba.conf |
FATAL: password authentication failed | Wrong password | Check .pgpass or replication user password |
could not connect to the primary server | Network issue | Check firewall, network connectivity |
requested WAL segment has already been removed | WAL gap too large | Rebuild Standby with new pg_basebackup |
Step 5: Testing Replication
Section titled “Step 5: Testing Replication”5.1 Create Test Data on Primary
Section titled “5.1 Create Test Data on Primary”On the Primary server, create a test table and insert data:
-- Connect to Primarysudo -u postgres psql
-- Create test database (if it doesn't exist)CREATE DATABASE test_replication;
-- Connect to test database\c test_replication
-- Create test tableCREATE TABLE replication_test ( id SERIAL PRIMARY KEY, created_at TIMESTAMP DEFAULT NOW(), data TEXT);
-- Insert test dataINSERT INTO replication_test (data) VALUES ('Test record from Primary at ' || NOW());
-- Verify dataSELECT * FROM replication_test;5.2 Verify Data on Standby
Section titled “5.2 Verify Data on Standby”On each Standby server, verify the data replicated:
-- Connect to Standbysudo -u postgres psql
-- Connect to test database\c test_replication
-- Query replicated data (should match Primary)SELECT * FROM replication_test;Expected Result: The same data from the Primary should appear on the Standby.
Quick Reference
Section titled “Quick Reference”Primary Server (10.0.0.90)
Section titled “Primary Server (10.0.0.90)”Configuration Files:
- PostgreSQL Config:
/etc/postgresql/18/main/postgresql.conf - HBA Config:
/etc/postgresql/18/main/pg_hba.conf
Monitoring Commands:
# Check connected Standbyssudo -u postgres psql -c "SELECT client_addr, state, sync_state FROM pg_stat_replication;"
# Check replication slotssudo -u postgres psql -c "SELECT slot_name, active, restart_lsn FROM pg_replication_slots;"
# View logssudo journalctl -u postgresql@18-main -fStandby Servers (10.0.0.91/92)
Section titled “Standby Servers (10.0.0.91/92)”Setup Commands:
# 1. Stop PostgreSQLsudo systemctl stop postgresql@18-main
# 2. Clear data directorysudo -u postgres bash -c "rm -rf /var/lib/postgresql/18/main/*"
# 3. Setup .pgpasssudo -u postgres bash -c 'echo "10.0.0.90:5432:*:replicator:PASSWORD" >> ~/.pgpass && chmod 600 ~/.pgpass'
# 4. Perform base backup (use unique slot name per standby)sudo -u postgres pg_basebackup \ -h 10.0.0.90 -p 5432 -U replicator \ -D /var/lib/postgresql/18/main \ -Fp -Xs -P -R \ -C -S replica_10_0_0_91
# 5. Start PostgreSQLsudo systemctl start postgresql@18-mainMonitoring Commands:
# Check if in recovery mode (should return 't')sudo -u postgres psql -c "SELECT pg_is_in_recovery();"
# View logssudo journalctl -u postgresql@18-main -fBest Practices
Section titled “Best Practices”1. Monitoring
Section titled “1. Monitoring”- Set up monitoring alerts for replication lag
- Monitor disk space on Primary (WAL files can accumulate)
- Track
pg_stat_replicationmetrics regularly
2. Backup Strategy
Section titled “2. Backup Strategy”- Replication is NOT a backup (it replicates corruption too)
- Maintain regular
pg_dumpbackups - Test restore procedures periodically
3. Security
Section titled “3. Security”- Use strong passwords for replication user
- Restrict
pg_hba.confto specific Standby IPs - Consider SSL/TLS for replication connections in production:
# In pg_hba.confhostssl replication replicator 10.0.0.91/32 scram-sha-2564. Capacity Planning
Section titled “4. Capacity Planning”- Size
max_slot_wal_keep_sizebased on:- Network speed between Primary and Standby
- Expected downtime tolerance
- Available disk space
- Example: If Standby can be down for 1 hour and WAL generates 2GB/hour, set at least 5GB