Monday, 29 January 2018

Partitions in PostgreSQL

Below are some important points regarding "Partitions" in PostgreSQL -
  1. PostgreSQL supports basic table partitioning until Version 9.6
  2. Query Performance can be increased dramatically situations when most of the heavily accessed rows of the table are in a single partition. 
  3. Partitions substitutes for leading column index, reducing size and make it more likely that the heavily used parts fits the memory. 
  4. When queries or updates access large partition of single partition, performance can be improved by taking advantage of sequential scan instead using index scan and random access read scattered across the table. 
  5. Bulk loads and deletes can be accomplished by adding or removing partitions, if that requirement is planned into the partitioning design
  6. ALTER TABLE NO INHERIT and DROP TABLE are both far faster than a bulk operation. These commands also entirely avoid the VACUUM overhead caused by a bulk DELETE.
  7. Each partition must be created as a child table of a single parent table.
  8. The parent table itself is normally empty; it exists just to represent the entire data set.
  9. PostgreSQL supports partitioning via table inheritance until version 9.6
  10. Declarative partitioning is the new feature which is supported from PostgreSQL version 10.0
PARTITION USING TABLE INHERITANCE:
  1. Create the “parent” table, from which all of the partitions will inherit.
  2. Create several “child” tables (each representing a partition of the data) that each inherit from the parent.
  3. Add constraints to the partition tables to define the row values in each partition.
  4. Create indexes on any parent and child tables individually. (Indexes do not propagate from the parent tables to child tables).
  5. Write a suitable trigger function to the master table so that inserts into the parent table redirect into the appropriate partition table.
  6. Create a trigger that calls the trigger function.
  7. Redefine the trigger function when the set of child tables changes.
PROBELEMS:
  1. Data consistency depends on the CHECK constraints specified on each child table.
  2. INSERT and COPY commands do not automatically propagate data to other child tables in the inheritance hierarchy, but instead rely on triggers, resulting in slower inserts.
  3. Substantial manual work is required to create and maintain child sub-tables.
  4. Given 3, re-partitioning is non-trivial (e.g., when adding disks, making schema changes).
  5. Ensuring uniqueness across partitioned tables is not supported.
  6. Indexes, constraints, and many table maintenance commands need to be applied to child tables explicitly. This greatly complicates management tasks.
DECLARATIVE PARTITIONS ( Version 10 Onwards):

Currently, declarative partitioning supports RANGE and LIST partitions:

RANGE — partition the table into “ranges” defined by a key column or set of columns, with no overlap between the ranges of values assigned to different partitions. For example, device_id.

LIST— partition the table by explicitly listing which key values appear in each partition, and are limited to a single column. For example, device_type.

How does it work?

Create a partitioned table using the PARTITION BY clause, which includes the partitioning method (RANGE in this example) and the list of column(s) to use as the partition key.

CREATE TABLE measurement (
 city_id int not null,
 logdate date not null,
 peaktemp int,
 unitsales int
) PARTITION BY RANGE (logdate);

After creating the partitioned table, partitions can be manually created:

CREATE TABLE measurement_y2006m02 PARTITION OF measurement
 FOR VALUES FROM (‘2006–02–01’) TO (‘2006–03–01’);

CREATE TABLE measurement_y2006m03 PARTITION OF measurement
 FOR VALUES FROM (‘2006–03–01’) TO (‘2006–04–01’);

CREATE TABLE measurement_y2006m04 PARTITION OF measurement
 FOR VALUES FROM (‘2006–04–01’) TO (‘2006–05–01’);
...
CREATE TABLE measurement_y2007m11 PARTITION OF measurement
 FOR VALUES FROM (‘2007–11–01’) TO (‘2007–12–01’);

And then indexes can be created:

CREATE INDEX ON measurement_y2006m02 (logdate);
CREATE INDEX ON measurement_y2006m03 (logdate);
CREATE INDEX ON measurement_y2006m04 (logdate);
CREATE INDEX ON measurement_y2007m11 (logdate);

The partitioned table functions similarly to the parent or “master” table in table inheritance but is much improved, serving as a much more powerful query interface to the underlying child table partitions. For instance, commands such as TRUNCATE and COPY now propagate to child tables via execution on the partitioned table. Additionally, users can insert data into underlying child tables via the partitioned table, since tuples (i.e., rows) now automatically route to the right partition on INSERT and no longer depend on triggers, as is the case with table inheritance.

No comments:

Post a Comment