Monday 29 January 2018

Replication in PostgreSQL

Synchronous Replication:

  1. Ensures that all data are first written to master, slave and transaction log of disk in both master and slave then it will be committed to client.
  2. It ensures maximum Protection.
  3. Synchronous replication offers the ability to confirm that all changes made by a transaction have been transferred to one synchronous standby server.
  4. The only chance data will be lost if both Master and slave crashed as same time. 
  5. The maximum wait time is the round trip starting from Master to Slave & slave to master. Once streaming replication has been configured, configuring synchronous replication requires only one additional configuration step: synchronous_standby_names must be set to a non-empty value.

Asynchronous Replication:


  1. It is by default.
  2. If Master/Primary server crashes then some transaction that were committed may be not replicated to standby server/slave. The amount of data loss is proportional to delay at the time of failover.

Logical Replication:


  1. Replication slots with Streaming replication
  2. Replication Slot - a slot represents a stream of changes that can be replayed to a client in the order they were made on the origin server. Each slot streams a sequence of changes from a single database, sending each change exactly once. Only one receiver may consume changes from a slot at any given time.
  3. It can only replicate DML changes not DDL changes.
  4. Any version it can replicate, any DB it can replicate as it is logical replication.
  5. Replication should be synchronous replication.
  6. Only after replication statement are read, it will delete the xlog.
  7. It don’t need any overhead process from server, as it reads from WAL files
  8. It is an approach towards multi-master slowly.
  9. Replication slots are counted as one of the connection in max_connection.
  10. Process - There is a logical decoding, which pulls all the queries from WAL logs. Decoder should have access to pg_catalog as needs various information from pg_catalog. The DML queries are decoded in decoder. There  is plugin attached to decoder which reads the decoder and send statements to slave server. The plugin can be customized plugin according to our use. We can use that plugin for analytical use also or for logical replication. 
  11. We should make sure, replcation_slot, wal_sender, wal_receiver is enabled in master and slave.
  12. Please find the below example.


postgres=# -- Create a slot named 'regression_slot' using the output plugin 'test_decoding'

postgres=# SELECT * FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');

   slot_name    | xlog_position

-----------------+---------------

regression_slot | 0/16B1970

(1 row)

postgres=# SELECT * FROM pg_replication_slots;

   slot_name    |    plugin     | slot_type | datoid | database | active |  xmin  | catalog_xmin | restart_lsn

-----------------+---------------+-----------+--------+----------+--------+--------+--------------+-------------
regression_slot | test_decoding | logical   |  12052 | postgres | f      |        |          684 | 0/16A4408
(1 row)

postgres=# -- There are no changes to see yet
postgres=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL);
location | xid | data
----------+-----+------
(0 rows)

postgres=# CREATE TABLE data(id serial primary key, data text);
CREATE TABLE

postgres=# -- DDL isn't replicated, so all you'll see is the transaction
postgres=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL);
location  | xid |    data
-----------+-----+------------
0/16D5D48 | 688 | BEGIN 688
0/16E0380 | 688 | COMMIT 688
(2 rows)

postgres=# -- Once changes are read, they're consumed and not emitted
postgres=# -- in a subsequent call:
postgres=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL);
location | xid | data
----------+-----+------
(0 rows)

postgres=# BEGIN;
postgres=# INSERT INTO data(data) VALUES('1');
postgres=# INSERT INTO data(data) VALUES('2');
postgres=# COMMIT;

postgres=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL);
location  | xid |                     data
-----------+-----+-----------------------------------------------
0/16E0478 | 689 | BEGIN 689
0/16E0478 | 689 | table public.data: INSERT: id[integer]:1 data[text]:'1'
0/16E0580 | 689 | table public.data: INSERT: id[integer]:2 data[text]:'2'
0/16E0650 | 689 | COMMIT 689
(4 rows)

postgres=# INSERT INTO data(data) VALUES('3');

postgres=# -- You can also peek ahead in the change stream without consuming changes
postgres=# SELECT * FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL);
location  | xid |                     data
-----------+-----+-----------------------------------------------
0/16E09C0 | 690 | BEGIN 690
0/16E09C0 | 690 | table public.data: INSERT: id[integer]:3 data[text]:'3'
0/16E0B90 | 690 | COMMIT 690
(3 rows)

postgres=# -- The next call to pg_logical_slot_peek_changes() returns the same changes again
postgres=# SELECT * FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL);
location  | xid |                     data
-----------+-----+-----------------------------------------------
0/16E09C0 | 690 | BEGIN 690
0/16E09C0 | 690 | table public.data: INSERT: id[integer]:3 data[text]:'3'
0/16E0B90 | 690 | COMMIT 690
(3 rows)

postgres=# -- options can be passed to output plugin, to influence the formatting
postgres=# SELECT * FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL, 'include-timestamp', 'on');
location  | xid |                     data
-----------+-----+-----------------------------------------------
0/16E09C0 | 690 | BEGIN 690
0/16E09C0 | 690 | table public.data: INSERT: id[integer]:3 data[text]:'3'
0/16E0B90 | 690 | COMMIT 690 (at 2014-02-27 16:41:51.863092+01)
(3 rows)

postgres=# -- Remember to destroy a slot you no longer need to stop it consuming
postgres=# -- server resources:
postgres=# SELECT pg_drop_replication_slot('regression_slot');
pg_drop_replication_slot
-----------------------

(1 row)



Physical Replication:

  1. Streaming Replication
  2. Hot Standby
  3. Warm Standby
  4. Log shipping

Streaming Replication:

There should be always a real time backup so that the data's are always safe. Streaming Replication is one of the process that always sync the real time data into different server. The main server is always called as master and the server where the data's need to get sync is called slave. 

Max_Wal sender process keeps checking the change block and sends it to replicated server where wal_reciever makes sure, it is written to the database accordingly. Row by row can be replicated in streaming replication, as process reads from xlog. 

Setting up Streaming Replication:


Pre-Requisite:
  1. Slave Empty
  2. Same Version (9.2.2)
  3. Same OS-Linux (Ubuntu)
  4. Same bit - 32
Steps:

1)Set the keygen so that there is no problem in connection both the server i.e slave and master
 
   a) As a root user create the key in master and in slave
            Master: ssh-keygen -t rsa
            Slave: ssh-keygen -t rsa
   b) Now copy the key from master to slave and slave to master
             Master: vi authorized-keys
                        cat id-rsa.pub from slave and paste in authorized-keys
             Slave: Do same thing for slave

2)Change the permission of the file authorized-keys using the below command -
      [root@linux] chmod 600 authorized-keys

3) We need to repeat the above steps to set the keygen as postgres user. If server A needs to connect to server B as postgres user. Then we need to copy public key of server A to authorized_keys in server B. Permissions are very important, without that it won’t work. Also we need to make sure that knows_hosts are cleared up before/after we do ssh keygen setup.
       .ssh permission - 700
       Authorized_keys - 640

4)  Check the master is connected to the slave without password or not.
         ssh postgres@192.168.175.3
     It may be still it is not connection, and throws an error. Then we can just reset our password in OS
         passwd postgres

5) Create directory wallogs in slave in root and change the permission
      chown -R postgres wallogs

6) Change postgres config(master):
     archive_mode = on
     archive_command = 'ssh slave_host test -d /wallogs && rsync -a %p slave_host:/wallogs/%f'
      (it is secure shelling to slave ip and test the wallogs if everything ok then it copies to slave wallogs)
      wal_level = hot_standby
      max_wal_senders = 2
       wal_keep_segments = 30

7) Restart the master
    service postgresql restart

8) We can switch log to check logs are getting created and moved.
    select pg_switch_xlog();

9) Change in pg_hba.conf
    type    database    user          host(slave)            Method
    host    all         all         192.168.175.196/32      trust
    host  replication   all         192.168.175.196/32      trust
     Change all md5 to trust.

10) Next we will run commands from slave. We won’t touch master

11) Keep a backup of postgresql.conf and pg_hba.conf in home directory.

12) Going to root:
     rm /tmp/data_pgsql.trigger.5432
     /etc/init.d/postgresql stop
     rm /usr/local/pgsql/data/ -rf
     echo "SELECT pg_start_backup('`hostname``/bin/date '+%Y%m-%d-%T'`');" | psql -h masr_hostte -U postgres
     rsync -avz   postgres@master_host:/usr/local/pgsql/data/  /usr/local/pgsql/data/
     su - postgres -c "/usr/local/pgsql/bin/psql -h master_host -c 'SELECT pg_stop_backup();'"
     rm /usr/local/pgsql/data/pg_xlog/archive_status/*
     rm /usr/local/pgsql/data/pg_xlog/*

13) Change in postgresql.conf
      hot_standby = on

14) Create recovery.conf
      standby_mode = 'on'
      primary_conninfo = 'host=master_host port=5432 user=postgres'
      restore_command = 'cp /wallogs/%f %p'
      trigger_file = '/tmp/pgsql.trigger'
      archive_cleanup_command = 'pg_archivecleanup /wallogs/ps_admin_juliet %r

15) Remove the postmaster.pid before restarting the slave server.

16) Restart the slave and check is it working or not.

17) Monitoring: pg_stat_replication

     Large differences between pg_current_xlog_location and sent_location field might indicate that the master server is under heavy load, while differences between sent_location and pg_last_xlog_receive_location on the standby might indicate network delay, or that the standby is under heavy load.

Recovery.conf File:

standby_mode = 'on'
primary_conninfo = 'host=master_host port=5432 user=postgres'
restore_command = 'cp /wallogs/%f %p'
trigger_file = '/tmp/pgsql.trigger'
archive_cleanup_command = 'pg_archivecleanup /wallogs/ps_admin_juliet %r’
Recovery_target_name = name point created pg_create_restore_point()
Recovery_target_time = till a specific time, it will recover. 
Recovery_target_xid = recovery upto transaction ID
Recovery_target_inclusive = True(default) false stops before recovery target
Recovery_target_timeline = Recovery till specific timeline
Pause_at_recovery_target = recovery target should pause after recovery is complete.

What happens at the end of recovery?

  1. End of recovery means the point where the the database opens up for writing
  2. New timeline is chosen
  3. A timeline history file is written
  4. The partial last WAL file on the previous timeline is copied with the new timeline's ID
  5. A checkpoint record is written on the new timeline
Example:

LOG: database system was interrupted; last known up at 2013-01-30
21:45:14 EET
LOG: starting archive recovery
LOG: redo starts at 13/E00000C8
LOG: could not open file "pg_xlog/0000000100000013000000E4": No
such file or directory
LOG: redo done at 13/E3D389A0
LOG: last completed transaction was at log time 2013-01-30 21:45:20+02
LOG: selected new timeline ID: 2
LOG: archive recovery complete
LOG: database system is ready to accept connections

Partial File:

0000000100000013000000E4
------------------- - - - - - - - -  - - -      - - - - ( Unused Portion)
0000000200000013000000E4
------------------------------------           --------(Merged Portion)
Both are partial file at one point of time and partial file vanishes and become normal WAL file.

TimeLine History File:

After timeline switch there is a history file in between
0000000100000013000000E4
0000000100000013000000E5
00000002.history
0000000200000013000000E3
0000000200000013000000E4

Cat 00000002.history
13/E4000000  no recovery target specified
13/ED000090  at restore point ”before FOSDEM”

Promoting a standby also switches timeline.

Pg-receivexlog:

  1. Pg_receivexlog streams transaction log from a running postgresql cluster.
  2. This tool uses streaming replication protocol to stream transaction logs to a local directory.
  3. These files can be used for PITR.
  4. Transaction logs are streamed in real time thus pg_receivelog can be used instead of archive command.
  5. EX: pg_receivelog -h localhost -D /usr/local/wal_archive
  6. It can be used as an alternate of archiving or for streaming replication.
  7. Max_wal_sender=1 . At Least 1 and create entry in pg_hba.conf for the IP as replication user, Then it will work
  8. It uses replication protocol

Replication Troubleshooting:

  1. Generally we look into view pg_stat_replication:
  2. Below queries can be used to find lag:
  3. psql>select extract( 'epoch' from now() - pg_last_xact_replay_timestamp())::int ;
  4. Few More Queries are below:
$ psql -c "SELECT pg_current_xlog_location()" -h192.168.0.10 (primary host)
 pg_current_xlog_location 
--------------------------
 0/2000000
(1 row)

$ psql -c "select pg_last_xlog_receive_location()" -h192.168.0.20 (standby host)
 pg_last_xlog_receive_location 
-------------------------------
 0/2000000
(1 row)

$ psql -c "select pg_last_xlog_replay_location()" -h192.168.0.20 (standby host)
 pg_last_xlog_replay_location 
------------------------------
 0/2000000
(1 row)

Pg_stat_replication:

postgres=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid                        | 14447
usesysid               | 10
usename              | postgres
application_name | walreceiver
client_addr            | 172.31.28.103
client_hostname   |
client_port             | 34792
backend_start       | 2016-11-29 00:09:15.958171-05
backend_xmin      |
state                      | streaming
sent_location        | 0/2000060
write_location       | 0/2000060
flush_location       | 0/2000060
replay_location     | 0/2000060
sync_priority         | 0
sync_state            | async


Column
Type
Description
pid
integer
Process ID of a WAL sender process
usesysid
oid
OID of the user logged into this WAL sender process
usename
name
Name of the user logged into this WAL sender process
application_name
text
Name of the application that is connected to this WAL sender
client_addr
inet
IP address of the client connected to this WAL sender. If this field is null, it indicates that the client is connected via a Unix socket on the server machine.
client_hostname
text
Host name of the connected client, as reported by a reverse DNS lookup of client_addr. This field will only be non-null for IP connections, and only when log_hostname is enabled.
client_port
integer
TCP port number that the client is using for communication with this WAL sender, or -1 if a Unix socket is used
backend_start
timestamp with time zone
Time when this process was started, i.e., when the client connected to this WAL sender
state
text
Current WAL sender state
sent_location
text
Last transaction log position sent on this connection
write_location
text
Last transaction log position written to disk by this standby server
flush_location
text
Last transaction log position flushed to disk by this standby server
replay_location
text
Last transaction log position replayed into the database on this standby server
sync_priority
integer
Priority of this standby server for being chosen as the synchronous standby
sync_state
text
Synchronous state of this standby server

Cascading Replication:

  1. Allows standby server to accept connection and stream WAL logs to other standby servers acting like a relay. This helps in reducing number of direct connection to the master.
  2. A standby acting as both receiver and standby is known as cascading standby.
  3. Standby servers that are more connected to master is called upstream servers and those servers which are further away are called downstream servers.
  4. Cascading replication is presently asynchronous.
  5. We need to set primary_conninfo in the downstream standby to point to the cascading standby.

Log Shipping/Warm Standby:

  1. Log shipping is asynchronous
  2. Wal_level = archive(Master) 
  3. Hot_standby=off(Slave). 
  4. Not able to connect to DB. DB system is starting up mode.
  5. Also known as Warm Standby. 

Hot-Standby:

  1. Wal_level =hot_standby(master) 
  2. hot_standby=on.(Slave) 
  3. Can connect to DB for read only operation.
  4. Slave will be one archive behind master. It will read from archive_location.


No comments:

Post a Comment