PostgreSQL with replication slots on FreeBSD

I wanted to setup a PostgreSQL database replication where you have one read/write master, and one or more readonly slaves, pretty much what has been possible with MySQL, too. As always, on FreeBSD that's fairly easy:

  1. Install PostgreSQL and contrib utilities on both the master and slave servers:
    # make -C /usr/ports/databases/postgresql10-server install clean
    # make -C /usr/ports/databases/postgresql10-contrib install clean
    
    For the server and client, I always choose the INTDATE, OPTIMIZED_CFLAGS, SSL, TZDATA, and XML options
  2. On all servers, add PostgreSQL to the startup config:
    # sysrc postgresql_enable=YES
    
  3. Optionally, on all servers, change the home directory of the postgres user:
    # pw usermod postgres -d /tank/postgres -m
    # rm -v /tank/postgres/.[a-z]*
    
    At least make sure that on all servers the home directory is the same
  4. Just on the master server, initialize the database cluster:
    # service postgresql initdb
    
  5. Edit the main configuration file /tank/postgres/data10/postgresql.conf and add the ip adress of the master server to the parameter listen_addresses, e.g.:
    listen_addresses = 'localhost, 10.0.0.10'
    
    In this case, 10.0.0.10 is the ip address that the slave database servers will connect to. As an alternative you could have your PostgreSQL master server listen on all configured ip addresses:
    listen_addresses = '*'
    
  6. Start the master database server:
    service postgresql start
    
  7. Connect to the instance and set a password for the database super-user postgres:
    # psql postgres postgres
    Password for user postgres: 
    psql (10.0)
    Type "help" for help.
    
    postgres=# ALTER USER postgres PASSWORD 'secret';
    ALTER ROLE
    postgres=# 
    
  8. Still in the psql client, create a user designated for the replication only, and a replication slot for the slave database server. Then, leave the client by either pressing CTRL+D or by entering the command \q:
    postgres=# CREATE USER replicator REPLICATION PASSWORD 'moarsecret';
    CREATE ROLE
    postgres=# SELECT * FROM pg_create_physical_replication_slot('server2');
     slot_name | lsn 
    -----------+-----
     server2   | 
    (1 row)
    
    postgres=# \q
    #
    
  9. Edit the authentication configuration file /tank/postgres/data10/pg_hba.conf and enforce the use of passwords, e.g.:
    # TYPE  DATABASE        USER            ADDRESS                 METHOD
    local   all             all                                     md5
    host    all             all             10.0.2.0/24             md5
    host    replication     replicator      10.0.0.11/32            md5
    
    This opens connections to all databases for all users from either the master server itself of from the subnet 10.0.2.0/24. Additionally, the user replicator can connect from the host 10.0.0.11, which will be a slave server, but is just allowed to do streaming replication due to the replication entry in the database field of pg_hba.conf
  10. Reload the server's configuration to pull in the changed pg_hba.conf:
    # psql -c "SELECT pg_reload_conf()" postgres postgres
    Password for user postgres: 
     pg_reload_conf 
    ----------------
     t
    (1 row)
    
    #
    
  11. Now login to the slave database server as the system user under which PostgreSQL runs, e.g. postgres, and grab an initial copy of the master server:
    # su - postgres
    # pg_basebackup -D /tank/postgres/data10 -R -v -P -S server2 -h 192.168.255.193 -U replicator
    Password: 
    pg_basebackup: initiating base backup, waiting for checkpoint to complete
    pg_basebackup: checkpoint completed
    pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
    pg_basebackup: starting background WAL receiver
    37642/37642 kB (100%), 1/1 tablespace                                         
    pg_basebackup: write-ahead log end point: 0/C5000130
    pg_basebackup: waiting for background process to finish streaming ...
    pg_basebackup: base backup completed
    
    The above output is from a small live database, thus actual values may vary
  12. Edit /tank/postgres/data10/postgresql.conf and change the parameter listen_addresses to reflect the slave database server, e.g.:
    listen_addresses = 'localhost, 10.0.0.11'
    
  13. As user root, start the slave database:
    # service postgresql start
    
  14. Check whether the replication is running and keeping up. On the master server, display both the current replication slots and any active replication:
    # psql -c "SELECT * FROM pg_replication_slots" -c "SELECT * FROM pg_stat_replication" postgres postgres
    Password for user postgres: 
     slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn 
    -----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
      server2  |        | physical  |        |          | f         | t      |      95670 |      |              | 0/C8000CF0  | 
    (1 row)
    
      pid  | usesysid |  usename   | application_name | client_addr | client_hostname | client_port |         backend_start         | backend_xmin |   state   |  sent_lsn  | write_lsn  | flush_lsn  | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state 
    -------+----------+------------+------------------+-------------+-----------------+-------------+-------------------------------+--------------+-----------+------------+------------+------------+------------+-----------+-----------+------------+---------------+------------
     95670 |   550154 | replicator | walreceiver      |  10.0.0.11  |                 |       33488 | 2017-11-05 12:56:57.616059+01 |              | streaming | 0/C8000CF0 | 0/C8000CF0 | 0/C8000CF0 | 0/C8000CF0 |           |           |            |             0 | async
    (1 row)
    
    There should be one entry for each slave database in pg_stat_replication. restart_lsn and replay_lsn should match. replay_lag should be null.
  15. On the slave servers, you have to compare this to the output of pg_stat_wal_receiver:
    # psql -c "SELECT * FROM pg_stat_wal_receiver" postgres postgres
    Password for user postgres: 
     pid |  status   | receive_start_lsn | receive_start_tli | received_lsn | received_tli |      last_msg_send_time       |     last_msg_receipt_time     | latest_end_lsn |        latest_end_time        | slot_name  |                                                                                     conninfo                                                                                      
    -----+-----------+-------------------+-------------------+--------------+--------------+-------------------------------+-------------------------------+----------------+-------------------------------+------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     787 | streaming | 0/C8000000        |                 1 | 0/C8000CF0   |            1 | 2017-11-05 13:14:51.049779+01 | 2017-11-05 13:14:51.116915+01 | 0/C8000CF0     | 2017-11-05 13:10:50.248827+01 |  server1   | user=replicator password=******** dbname=replication host=10.0.0.11 port=5432 fallback_application_name=walreceiver sslmode=prefer sslcompression=1 target_session_attrs=any
    (1 row)
    
    latest_end_lsn should match replay_lsn.
  16. If you decommission a slave database forever, then do not forget to remove its replication slot on the master server like so:
    # psql -c "SELECT * FROM pg_drop_replication_slot('server3')" postgres postgres
    Password for user postgres: 
     pg_drop_replication_slot 
    --------------------------
     
    (1 row)
    
    If you forget to drop a replication slot for a non-existing server, then the master database will keep write-ahead-logs (WAL files) forever, causing /tank/postgres/data10/pg_wal to grow.