PostgreSQL is an amazing database engine. Like any well-engineered system, it has strengths and weaknesses. To be able to make effective use of it, one has to look one level deeper and understand how things work.

This post is the first one in the series of posts that we’ve planned around the internals of Postgres.

At Algoshelf, we are building next-generation tools for demand planning, and this requires working with large amounts of data. Postgres is one of the favorite tools under our belt.

As we are working with large amounts of data, it is essential to understand how much space each database table, each row, and each cell takes on the disk so that we can reason about the performance confidently.

A Small Experiment

Postgres has a very simple extension called pgstattuple that provides tuple-level (row-level) statistics.

Postgres calls each row as a tuple internally, so wherever you see mention of a tuple, it just means one row in the database table.

You can enable this extension by running the following command, as shown below.

$ psql testdb
testdb=# create extension pgstattuple;
CREATE EXTENSION

Let’s create a small table and peep in.

testdb=# create table test (a integer);
CREATE TABLE

test=# select * FROM pgstattuple('test');
 table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent
-----------+-------------+-----------+---------------+------------------+----------------+--------------------+------------+--------------
         0 |           0 |         0 |             0 |                0 |              0 |                  0 |          0 |            0

As you can see, the table has no rows, and it does not take any space.

Let’s try to insert a row and how much space it takes.

testdb=# insert into test (a) values (0);                                                                                                                                                   INSERT 0 1
testdb=# select * FROM pgstattuple('test');
 table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent
-----------+-------------+-----------+---------------+------------------+----------------+--------------------+------------+--------------
      8192 |           1 |        28 |          0.34 |                0 |              0 |                  0 |       8128 |        99.22
(1 row)

As you can see now, there is one row taking 28 bytes. But, why is it taking 28 bytes, when integer needs only 4 bytes? That is because each tuple starts with a header containing some metadata about that tuple and that takes 24 bytes. That is the overhead we need to pay for every tuple.

Postgres allocates space in chunks of 8KB pages and keeps track of the free space in those pages.

Deletes and Vacuum

What happens when we delete some rows from the database?

Let’s start by adding a new row.

testdb=# insert into test (a) values (1);
INSERT 0 1
testdb=# select * FROM pgstattuple('test');
 table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent
-----------+-------------+-----------+---------------+------------------+----------------+--------------------+------------+--------------
      8192 |           2 |        56 |          0.68 |                0 |              0 |                  0 |       8092 |        98.78
(1 row)

Now the table has 2 rows and using 56 bytes, 28 bytes for each row.

Let’s see what happens if we delete a row.

testdb=# delete from test where a=0;
DELETE 1
testdb=# select * FROM pgstattuple('test');
 table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent
-----------+-------------+-----------+---------------+------------------+----------------+--------------------+------------+--------------
      8192 |           1 |        28 |          0.34 |                1 |             28 |               0.34 |       8092 |        98.78
(1 row)

The deleted rows become dead tuples, and they’ll remain there until vacuum claims them back. The vacuum is an operation that is run periodically by Postgres to reclaim dead tuples, but it is also possible to trigger that manually.

testdb=# vacuum verbose test;
INFO:  vacuuming "public.test"
INFO:  "test": removed 1 row versions in 1 pages
INFO:  "test": found 1 removable, 1 nonremovable row versions in 1 out of 1 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 188907
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.01 s.
VACUUM
testdb=# select * FROM pgstattuple('test');
 table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent
-----------+-------------+-----------+---------------+------------------+----------------+--------------------+------------+--------------
      8192 |           1 |        28 |          0.34 |                0 |              0 |                  0 |       8124 |        99.17
(1 row)

As you can see from the above example, the vacuum operation has reclaimed the dead tuples.

Updates

What about updates to one or more rows? Will it update in place or will it allocate a new row?

testdb=# update test set a=a+1;
UPDATE 1
testdb=# select * FROM pgstattuple('test');
 table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent
-----------+-------------+-----------+---------------+------------------+----------------+--------------------+------------+--------------
      8192 |           1 |        28 |          0.34 |                1 |             28 |               0.34 |       8092 |        98.78
(1 row)

It looks like the update operation is making a copy of the row first, modifying it and marking the old one as dead. Why isn’t Postgres making an in-place edit? Wouldn’t that be more efficient?

Given that a database has to support concurrent writes and transactional semantics, it is essential to have the ability to rollback to the previous state even after the change. Postgres achieves that using Multiversion Concurrency Control (MVCC), a sophisticated mechanism to handle concurrent writes without locking the database.

So, any update to a row makes a copy of the whole row. That means the more columns you have in the table, the more time the update is going to take.

If you have a table with many columns and a few of them are updated very frequently, it may be a good idea to split the table into two with the most frequently updated columns in a separate table. Of course, that makes the reads a bit slower as you need to join the tables when reading. After all, engineering is all about picking the right trade-offs.

Epilogue

Databases are an essential component of modern software development. Very often, we interact with the database through a layer of ORM and don’t pay attention to what is happening in the database.

While abstractions have their place, the genius is in the details!


We’re hiring! We are keen to work with enthusiastic engineers who are passionate about product development, software engineering and machine learning. Please visit our careers page for more details.