Pg_rewind in PostgreSQL

Short Summary:

Pg_rewind is a tool for synchronizing a PostgreSQL cluster with another copy of the same cluster, after the clusters' timelines have diverged. A typical scenario is to bring an old master server back online after failover, as a standby that follows the new master. The result is equivalent to replacing the target data directory with the source one. All files are copied, including configuration files. The advantage of pg_rewind over taking a new base backup, or tools like rsync, is that pg_rewind does not require reading through all unchanged files in the cluster. That makes it a lot faster when the database is large and only a small portion of it differs between the clusters.

Official Documentation Link:

https://www.postgresql.org/docs/9.6/static/app-pgrewind.html

IP Used for the Steps Below:

IP - 198.100.1.10 (Master/Primary)
IP - 198.100.1.11 (Slave/Replica/Secondary)

Pre-Requisite:

Streaming Replication is Enabled between Master and Replica Instance.
wal_log_hints should be enabled in postgresql.conf file.
User used for pg_rewind should be superuser.

Steps:

Suppose there are issue with Master DB(198.100.1.10) and it got crashed for some reason. We need to failover(Promote in PostgreSQL Terms) so that application connects and things are back to normal. Therefore, below command can be used to perform failover to Replica(198.100.1.11).
Command needs to be issued from Replica - /usr/pgsql-9.6/bin/pg_ctl -D /var/lib/pgsql/9.6/data promote
Now we have the new master after promotion/failover i.e. 198.100.1.11 and we need to make sure we have its replica in place as early as possible.
The traditional way to build up replication is to copy the data directory using base_backup and start the replica instance. It can be very time consuming if DB size is in TB. Therefore, to reduce the effort and time, we can use pg_rewind and it sync up the new master and makes replica in few minutes as it only copies the delta of changes.
To sync up the old primary(198.100.1.10) with new primary(198.100.1.11), we need to stop the old primary (198.100.1.10) using the below command issued in old primary.
Command - /usr/pgsql-9.6/bin/pg_ctl -D /var/lib/pgsql/9.6/data stop
Before we run pg_rewind, it is good to backup the below configuration files -
A. pg_hba.conf
B. postgresql.conf
C. postgresql.auto.conf
D. recovery.conf
Now we are good to run pg_rewind in the old primary(To be new replica) using the below command -
Template Command - /usr/pgsql-9.6/bin/pg_rewind --target-pgdata /var/lib/pgsql/9.6/data --source-server="port=<Source DB Port> host=<Source Host> dbname=postgres user=<Source DB Superuse> password=<Password for Superuser>" --progress
Example Command - /usr/pgsql-9.6/bin/pg_rewind --target-pgdata /var/lib/pgsql/9.6/data --source-server="port=5432 host=198.100.1.11 dbname=postgres user=superuser password=SomeStrongPassword" --progress
Sample output after running pg_rewind -
connected to server
servers diverged at WAL position 198/C3C13ED8 on timeline 1
rewinding from last common checkpoint at 198/C3C13E30 on timeline 1
reading source file list
reading target file list
reading WAL in target
need to copy 629 MB (total source directory size is 51531 MB)
644481/644481 kB (100%) copied
creating backup label and updating control file
syncing target data directory
Done!
Once pg_rewind is complete, we need to configure the recovery.conf, pg_hba.conf & postgresql.conf file if they are different in both master and replica.
Sample recovery.conf file contents -
primary_conninfo = 'host=198.100.1.10 port=5432 user=replicator password=StrongPass'
trigger_file = '/var/lib/pgsql/9.6/data/failover'
standby_mode = 'on'
recovery_target_timeline = 'latest'
IP of Replication User should be present in pg_hba.conf so that there is no issue with replication after failover. It is always good to keep the all the configuration file same for both Primary and Replica.
Lastly, its good to check if replication is working fine using the below two commands -
Command 1 -
SELECT
CASE
WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0
ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())::INTEGER
END
AS replication_lag;
Command 2 -
Select pg_last_xlog_receive_location();select pg_last_xlog_replay_location(); — This value should change constantly and should have same value.
Other simple way to check if replication is working fine is by creating a new DB in master and check if it replicates in Replica( If DBA is allowed to create test DB)

Open Source Concepts - PostgreSQL, MySQL and Linux

Pages

Tuesday, 11 September 2018