Open Source Concepts - PostgreSQL, MySQL and Linux

Monday, 8 July 2019

Free Space Mapping Vs Visibility Map

Free Space Map ( FSM) Vs Visibility Map (VM)

Free Space Map ( FSM)

When inserting a heap or an index tuple, PostgreSQL uses the FSM of the corresponding table or index to select the page which can be inserted it. All tables and indexes have respective FSMs. Each FSM stores the information about the free space capacity of each page within the corresponding table or index file. All FSMs are stored with the suffix 'fsm', and they are loaded into shared memory if necessary. The extension pg_freespacemap provides the freespace of the specified table/index.

Visibility Map ( VM)

Vacuum processing is costly; thus, the VM was been introduced in version 8.4 to reduce this cost. The basic concept of the VM is simple. Each table has an individual visibility map that holds the visibility of each page in the table file. The visibility of pages determines whether each page has dead tuples. Vacuum processing can skip a page that does not have dead tuples.

Suppose that the table consists of three pages, and the 0th and 2nd pages contain dead tuples and the 1st page does not. The VM of this table holds information about which pages contain dead tuples. In this case, vacuum processing skips the 1st page by referring to the VM's information.

Friday, 29 March 2019

Create Read only User in PostgreSQL ( RDS and On-Prem)

-- Create a group

CREATE ROLE readaccess;

-- Grant access to existing tables

GRANT USAGE ON SCHEMA public TO readaccess;

GRANT SELECT ON ALL TABLES IN SCHEMA public TO readaccess;

-- Grant access to future tables

ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO readaccess;

-- Create a final user with password

CREATE USER rahul WITH PASSWORD 'password';

GRANT readaccess TO rahul;

CREATE USER rahul_saha WITH PASSWORD 'password';

GRANT readaccess TO rahul_saha;

It is allowed by default for every user to create tables in public schema. If you want to mitigate this, do the following:

REVOKE ALL ON SCHEMA public FROM public

GRANT ALL ON SCHEMA public TO writeuser

Tuesday, 11 September 2018

Installing & Configuring POWA (PostgreSQL Workload Analyzer) in Linux

POWA is another very good tool for PostgreSQL DBA. It helps analyzing the stats of queries using few extensions and in UI based format. Below are some important details how to install and configure POWA in linux and view the stats over web.

Some of the important points of POWA -

POWA current version supports PostgreSQL version 9.4 and above but if your PostgreSQL version is less than 9.4 then there is a different version which is POWA version 1.x for it.
POWA Supports various extensions ( pg_stat_statement, pg_qualstat, pg_stat_kcache,hypopg).

Some of the Pre-requisite -

POWA must be installed on same instance where PG is installed.
Shared_preload_libraries needs to be modified ( Postgres service needs reboot).
Contrib package needs to be installed.
POWA user should be created which needs superuser privileges.
POWA Web needs to be configured to view the stats.

Links from where Extensions can be downloaded which are needed for POWA -

hypopg - https://github.com/HypoPG/hypopg/releases
POWA - https://github.com/powa-team/
POWA Installation - https://powa.readthedocs.io/en/latest/quickstart.html#install-powa-from-packages-on-rhel-centos
PG_Qualstat - https://github.com/powa-team/pg_qualstats/archive/1.0.2.tar.gz
PG STAT Kcache - https://github.com/powa-team/pg_stat_kcache/archive/REL2_1_0.tar.gz

Steps to Implement POWA:

Install all the extension which are not available.
Mandatory Extensions are - "pg_stat_statements,powa,pg_stat_kcache,pg_qualstats, hypopg” in one DB and hypopg in all the DB.
Download the extensions with the above links.(make || make install)
Make changes in postgresql.conf file for the parameter - shared_preload_libraries='pg_stat_statements,powa,pg_stat_kcache,pg_qualstats’
Next create Database POWA and create the extensions.
\c to POWA

CREATE EXTENSION pg_stat_statements;
CREATE EXTENSION btree_gist;
CREATE EXTENSION powa;
CREATE EXTENSION pg_qualstats;
CREATE EXTENSION pg_stat_kcache;
CREATE EXTENSION hypopg; to all DB
CREATE ROLE powa SUPERUSER LOGIN PASSWORD 'astrongpassword’;

Install powa-web .
Dependency to install powa web are - python, python-psycopg2, python-sqlalchemy & python-tornado.
Copy Powa Web config file to /etc/powa.conf
Edit powa web config file with address and port of web.
Start powa web
Check the console from web.

Install Powa -Web:

wget https://pypi.io/packages/source/p/powa-web/powa-web-3.1.3.tar.gz
tar -zxvf powa-web-3.1.3.tar.gz
cd powa-web-3.1.3
cp ./powa-web.conf-dist ./powa-web.conf
./powa-web
Copy powa-web.conf to /etc/ and start powa web from root user. You should be able to view the web version here - http://172.18.1.9:8888/login

Output of Powa-Web config file:

us02tespd02 ~ # cat /etc/powa-web.conf
servers={
'us02tespd02': {
'host': 'localhost',
'port': '5432',
'database': 'powa'
},
'us02tespd03': {
'host': '172.18.1.8',
'port': '5432',
'database': 'powa'
}
}

cookie_secret="SECRET_STRING"
address="172.18.1.9"
port=8888

Pg_rewind in PostgreSQL

Short Summary:

Pg_rewind is a tool for synchronizing a PostgreSQL cluster with another copy of the same cluster, after the clusters' timelines have diverged. A typical scenario is to bring an old master server back online after failover, as a standby that follows the new master. The result is equivalent to replacing the target data directory with the source one. All files are copied, including configuration files. The advantage of pg_rewind over taking a new base backup, or tools like rsync, is that pg_rewind does not require reading through all unchanged files in the cluster. That makes it a lot faster when the database is large and only a small portion of it differs between the clusters.

Official Documentation Link:

https://www.postgresql.org/docs/9.6/static/app-pgrewind.html

IP Used for the Steps Below:

IP - 198.100.1.10 (Master/Primary)
IP - 198.100.1.11 (Slave/Replica/Secondary)

Pre-Requisite:

Streaming Replication is Enabled between Master and Replica Instance.
wal_log_hints should be enabled in postgresql.conf file.
User used for pg_rewind should be superuser.

Steps:

Suppose there are issue with Master DB(198.100.1.10) and it got crashed for some reason. We need to failover(Promote in PostgreSQL Terms) so that application connects and things are back to normal. Therefore, below command can be used to perform failover to Replica(198.100.1.11).
Command needs to be issued from Replica - /usr/pgsql-9.6/bin/pg_ctl -D /var/lib/pgsql/9.6/data promote
Now we have the new master after promotion/failover i.e. 198.100.1.11 and we need to make sure we have its replica in place as early as possible.
The traditional way to build up replication is to copy the data directory using base_backup and start the replica instance. It can be very time consuming if DB size is in TB. Therefore, to reduce the effort and time, we can use pg_rewind and it sync up the new master and makes replica in few minutes as it only copies the delta of changes.
To sync up the old primary(198.100.1.10) with new primary(198.100.1.11), we need to stop the old primary (198.100.1.10) using the below command issued in old primary.
Command - /usr/pgsql-9.6/bin/pg_ctl -D /var/lib/pgsql/9.6/data stop
Before we run pg_rewind, it is good to backup the below configuration files -
A. pg_hba.conf
B. postgresql.conf
C. postgresql.auto.conf
D. recovery.conf
Now we are good to run pg_rewind in the old primary(To be new replica) using the below command -
Template Command - /usr/pgsql-9.6/bin/pg_rewind --target-pgdata /var/lib/pgsql/9.6/data --source-server="port=<Source DB Port> host=<Source Host> dbname=postgres user=<Source DB Superuse> password=<Password for Superuser>" --progress
Example Command - /usr/pgsql-9.6/bin/pg_rewind --target-pgdata /var/lib/pgsql/9.6/data --source-server="port=5432 host=198.100.1.11 dbname=postgres user=superuser password=SomeStrongPassword" --progress
Sample output after running pg_rewind -
connected to server
servers diverged at WAL position 198/C3C13ED8 on timeline 1
rewinding from last common checkpoint at 198/C3C13E30 on timeline 1
reading source file list
reading target file list
reading WAL in target
need to copy 629 MB (total source directory size is 51531 MB)
644481/644481 kB (100%) copied
creating backup label and updating control file
syncing target data directory
Done!
Once pg_rewind is complete, we need to configure the recovery.conf, pg_hba.conf & postgresql.conf file if they are different in both master and replica.
Sample recovery.conf file contents -
primary_conninfo = 'host=198.100.1.10 port=5432 user=replicator password=StrongPass'
trigger_file = '/var/lib/pgsql/9.6/data/failover'
standby_mode = 'on'
recovery_target_timeline = 'latest'
IP of Replication User should be present in pg_hba.conf so that there is no issue with replication after failover. It is always good to keep the all the configuration file same for both Primary and Replica.
Lastly, its good to check if replication is working fine using the below two commands -
Command 1 -
SELECT
CASE
WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0
ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())::INTEGER
END
AS replication_lag;
Command 2 -
Select pg_last_xlog_receive_location();select pg_last_xlog_replay_location(); — This value should change constantly and should have same value.
Other simple way to check if replication is working fine is by creating a new DB in master and check if it replicates in Replica( If DBA is allowed to create test DB)

Auto Explain in PostgreSQL

Link - https://www.postgresql.org/docs/9.4/static/auto-explain.html
Reference Link - https://www.endpoint.com/blog/2012/10/30/postgresql-autoexplain-module

Some Key Points for Auto Explain:

Auto Explain helps analyzing the queries automatically in log file based on the configuration we set up.
Auto Explain can be set up in session level(preferred) and also in postgresql.conf file(which needs a reboot).
There are performance impact if we set up auto explain for all the queries as it takes up additional time to explain the queries and log in log file before executing it actually.
Auto Explain supports from PG version 9.3.
If we add Auto Explain Plan in postgresql.conf file, it would need a reboot of the postgres service.
Auto Explain gets logged in default log file of the postgres instance.

Steps to set up Auto Explain in session( Once we disconnect from session it wont be logging auto explain plan anymore in log file)

We need to issue the below commands so that auto explain starts getting logged in log file.
Command 1 - LOAD 'auto_explain';
Command 2 - SET auto_explain.log_min_duration = 5; ( It is in seconds)
Command 3 - SET auto_explain.log_analyze = true; ( It logs the explain plan in more detailed way)
This process does not needs any reboot or reload.

Steps to configure Auto Explain Plan in postgresql.conf file:

Make the below parameter changes in postgresql.conf file.
Parameter 1 - shared_preload_libraries = ‘auto_explain’;
Parameter 2 - auto_explain.log_min_duration = '5s'
Parameter 3 - auto_explain.log_analyze = true
Restart postgres service.

Wednesday, 15 August 2018

Installing and Configuring PgBadger in Linux

PgBadger is one of the best open source tool for postgres to have a track of slow queries, most frequent queries, users analysis and many more. Therefore, I am making this blog for detailed steps to set up pgbadger in Linux and viewing report in web.

As pgbadger reads the log file of postgres, therefore we need to make changes in the log file so that correct information are logged and pgbadger can read the log file appropriately. Below are some of the changes needs to make in postgres log file -

Log connection - On
Log disconnection - On
Log Duration - off
Log statement - none
Log hostname - On
log_line_prefix = '%t [%p]: [%l-1] db=%d, %u , app=%a, %h '
Log Min Duration Statement - 0
Log Checkpoint - on
lc_messages = ‘C’

Below is the configuration change for the log file size, format and rotation -

log_truncate_on_rotation = off
log_rotation_age = 1d
log_rotation_size = 0

After making all the above changes, verify the generation of log file with the changes made in configuration file. After verifying it time to download the pgbadger and configure it from the below official site -

Link - https://github.com/dalibo/pgbadger/releases ( Version 9.2)

Steps for installing pgbadger in linux( RHEL in my case):

Download the latest pgbadger using wget command - wget https://github.com/dalibo/pgbadger/archive/v9.2.tar.gz
Untar the tar Ball - tar xvf pgbadger-9.2.tar
Install the source code using the below steps-
A. cd pgbadger-9.2
B. perl Makefile.PL
C. Make
D. Make Install

Once we install the pgbadger, we can view the report in text format or we can view the report in HTML format in web. We will see both the process in details -

Text Format -

In this process we simply parse the log file and generate a report in text format. And the text format report is generated in linux only, and we can view the report using less, more or VI command. Below is the command for it -

"pgbadger /data/postgres/data/pg_log/postgresql-2018-07-02_075351.log --outfile /tmp/queries_4.txt"

HTML FORMAT -

For this format, we have multiple steps to achieve this. We need to make sure we have apache or httpd installed and have all the report generated in the root folder so that we can view the reports over web. Below are the details steps -

Firstly, we need to install apache in the linux system using yum or rpm command. Along with it, there are few more dependencies we need to install. They are - GCC, apr-util and apr.
If we do not have apache in repository, instead we can install httpd and the above dependencies.
Lastly, we need to create the report in the apache root directory i.e. "/var/www/html" so that apache can read the html files.
Below is the command to generate html report - pgbadger --incremental --outdir /var/www/html /data/postgres/data/pg_log/postgresql-2018-07-02_075351.log -o out.html
Most important is the cron the above steps and generate the report everyday.
You can view the reports online at http://127.0.0.1:80 if it is localhost or the IP where pgbadger is installed.

I have installed pgbadger many times and have good hands on experience. So, please comment below if you face any issue and I will be there to help you.

Simple Logical PostgreSQL Backup Script (Pg_dump)

Below is the script written by me to make life easier for all DBA. You would just need to change few things in accordance to your environment and the script is ready for use. Also, you can comment in the comment section, if you want any modified script and I can work on that and share with you. Below script is meant for taking logical dump, I did not include much information but as I said you can always tweak but below script is a working script and it makes my life easy. This script can be used for restore process, not for recovery purpose. For that there is a different script using base backup.

Below are some of the details for the script -

1. I am declaring the backup user, backup directory, date, port and small small information which will get repeated in the script.

2. First condition will check if backup directory exist, if not it will create a backup directory.

3. I have also made a table to note down all the details of backup, so that I can query and find out if the backup is successful or failed. It makes a better Audit History.

4. Script is making a note of all the DB to be taken as backup and it is taking the backup of individual backup in for loop and making a note if backup is successful or not.

5. Lastly, I am taking up the backup of global users which is very important and making a entry in table too.

6. Sample how the backup take will look like is mentioned below and it can be taken as reference if you need any similar sort of Backup Strategy.

Here goes the below script -

#!/bin/bash
## Backup Script of Postgres
## Script will take backup of individual database and global objects. Script is good if DB size is small and do not need backup of WAL Files
## Author - rahulsaha0309@gmail.com

set -x

export PGPASSWORD=PassWord!
DATE=`(date +"%Y-%m-%d")`
SIMPLE_DATE=`(date +"%Y%m%d")`
BACKUP_DIRECTORY=/tmp/postgres_backup/$SIMPLE_DATE
BACKUP_USER=postgres
PG_DUMP=/data/postgres/bin/pg_dump
PORT=5432
PSQL=/data/postgres/bin/psql
DUMPALL=/data/postgres/bin/pg_dumpall

if [ ! -d $BACKUP_DIRECTORY ]
then
/bin/mkdir -p $BACKUP_DIRECTORY
else
echo " Backup Directory exist"

fi

TABLE_EXIST=`$PSQL -d audit -t -c "SELECT EXISTS (SELECT 1 FROM information_schema.tables WHERE table_schema = 'public' AND table_name='backup_logs')"`

if [ $TABLE_EXIST = f ]
then
$PSQL -d audit -c "create table backup_logs(date date, database varchar, status varchar, backup_location varchar)"
else
echo "table_exist"
fi

LIST_DB=`$PSQL -t -d postgres -c "select datname from pg_database where datname not like 'postgres';"`

array1=($LIST_DB)
echo ${array1[@]}

for database in "${array1[@]}"

do
BACKUP_DATABASE="$PG_DUMP -U$BACKUP_USER -p $PORT -d $database -f $BACKUP_DIRECTORY/backup_$database.sql"
echo "Taking backup of DATABASE $database"
$BACKUP_DATABASE

if [ $? -eq 0 ]
then
$PSQL -d audit -c "insert into backup_logs values(current_timestamp, '$database', 'Success', '$BACKUP_DIRECTORY/backup_$database.sql')"
else
$PSQL -d audit -c "insert into backup_logs values(current_timestamp, '$database', 'Failed', '$BACKUP_DIRECTORY/backup_$database.sql')"
fi

done

echo "BACKUP_GLOBAL_ROLES"

$DUMPALL -U$BACKUP_USER -p $PORT -g -f $BACKUP_DIRECTORY/backup_roles.sql

Output in table if backup is success or not.

audit=# select * from backup_logs;
date | database | status | backup_location
------------+-----------+---------+----------------------------------------------------
2018-07-24 | template1 | Success | /tmp/postgres_backup/20180724/backup_template1.sql
2018-07-24 | template0 | Failed | /tmp/postgres_backup/20180724/backup_template0.sql
2018-07-24 | test | Success | /tmp/postgres_backup/20180724/backup_test.sql
2018-07-24 | test1 | Success | /tmp/postgres_backup/20180724/backup_test1.sql
2018-07-24 | test2 | Success | /tmp/postgres_backup/20180724/backup_test2.sql
2018-07-24 | new | Success | /tmp/postgres_backup/20180724/backup_new.sql
2018-07-24 | audit | Success | /tmp/postgres_backup/20180724/backup_audit.sql
(7 rows)

Cron Entry:
##Scheduling Postgres Backup - Logical pg_dump backup
21 7 * * * /opt/postgresql/scripts/backup_postgres.sh > /data/postgres_backup/`/bin/date +'\%Y\%m\%d'`/log_file/postgres_backup.out 2>&1

-bash-4.1$ crontab -l
##Scheduling Postgres Backup - Logical pg_dump backup
5 21 * * * /opt/postgresql/scripts/backup_postgres.sh > /data/postgres_backup/backup_log_file/`date +\%Y\%m\%d`-cron.out 2>&1

Pages