The Build

10 May 2018


PostgreSQL Replication at PerconaLive 2018

The slides from my talk on PostgreSQL Replication at PerconaLive 2018 are available.

10 April 2018


Change these: wal_level and wal_hint_bits

(An intermittent series on PostgreSQL parameters whose default settings you should change.)

wal_level = logical

wal_level controls how much information is written into the write-ahead log. Higher settings write more information, enabling more features, at the expense of (somewhat) larger write-ahead log volume.

Just set it to logical, the highest setting. The incremental volume is not that significant, and you are then ready to support logical replication without a restart in case you feel the need.

(If you are running 9.3 or earlier, the highest wal_level setting is hot_standby, but you should have upgraded by now.)

wal_log_hints = on # if you do not have data page checksums enabled.

This setting enables the logging of certain hint-bit changes. Without going into gruesome detail, this is required for certain tools (in particular, pg_rewind) to operate properly. Just turn it on.

If the cluster was initialized with data page checksums, you’re getting hint bit logging anyway, so there’s no need to turn it on (although it doesn’t hurt to do so; it’s ignored on clusters with data page checksums).

6 April 2018


Change these: Logging

(An intermittent series on PostgreSQL parameters whose default settings you should change.)

Here’s my preferred logging configuration:

log_destination = 'csvlog'
logging_collector = on
log_directory = '/var/log/postgresql'
log_filename = 'postgresql-%Y%m%d-%H%M%S.log'
log_rotation_size = 1GB
log_rotation_age = 1d
log_min_duration_statement = 250ms
log_checkpoints = on
log_connections = on
log_disconnections = on
log_lock_waits = on
log_statement = 'ddl'
log_temp_files = 0
log_autovacuum_min_duration = 1000

Of course, you can adjust these to taste. Set log_directory and log_filename as appropriate for your own directory structure and retention policy. (As listed, the log files will build up forever, so some sort of compress-and-delete strategy is required here.) You can set log_min_duration_statement as appropriate to control log volume; if you can get away with log_min_duration_statement = 0, great.

The others should be set as listed on any system; they don’t take up that much in terms of log volume, and the information provided is very helpful.

3 April 2018


Change these: archive_mode and archive_command

(An intermittent series on PostgreSQL parameters whose default settings you should change.)

By default archive_mode is set to off, and thus archive_command is ignored. Even if you are not going to be using WAL archiving, you should change these. You can use settings such as:

archive_mode = on
archive_command = '/bin/true'

It requires a server restart to turn on archive_mode, but just a reload to change archive_command. Thus, you’re ready to start WAL archiving without a restart by getting these set in advance.

30 March 2018


Change This: wal_compression

(An intermittent series on PostgreSQL parameters whose default settings you should change.)

Introduced in PostgreSQL 9.5, wal_compression is off by default, but you should probably turn it on.

First, what does it do? The documentation helpfully explains:

When this parameter is on, the PostgreSQL server compresses a full page image written to WAL when full_page_writes is on or during a base backup.

By default (that is, when full_page_writes = on, which is the default setting), the first time a data page is changed after a checkpoint, the entire data page is written into the write-ahead log. (Thomas Vondra has a good explanation of this and why it is done).

This results in a significant burst of extra WAL traffic after a checkpoint. (This is one of the reasons to increase the time period between checkpoints). This setting, when on, compresses those full pages images so that they don’t take up quite so much WAL space, reducing both the disk required to hold the WAL, and the amount of network traffic required to send the WAL information down to a streaming replica.

It’s almost always a benefit to turn it on. The only time it might hurt performance is if the database is heavily CPU-limited, so that the (small) extra CPU required to compress the WAL information is significant. That’s unlikely, so turn it on.

27 March 2018


Mountpoints and the Single PostgreSQL Server

Ultimately, a PostgreSQL database is just files, and those files have to be stored somewhere. Absent tablespaces, they are all stored in a single directory structure, traditionally called PGDATA. While smaller instllations can just use the defaults associated with the packaging, larger databases are often installed on their own volume with its own mountpoint.

Two rules to keep in mind when choosing a name for the mountpoint and the directory structure:

  1. Always include the major version number in the directory, but,
  2. Never include the major version in the mountpoint.

For example, /pgsql is a great mountpoint name, but /pgsql/10 isn’t. Instead, call the mountpoint /pgsql and call PGDATA /pgsql/10/data (for example) under it.


At some point, you’ll probably want to have more than one major version installed on the same server. For example, you may want to use pg_upgrade to upgrade between major versions. In --link mode (which is the most sensible way to use pg_upgrade), you need to create a new cluster for the major version, but that new cluster needs to be on the same file system as the old cluster, so that pg_upgrade can create hard links between the data files.

If the major version is baked into the mountpoint, you start getting things like /pgdata/9.6/10 and similar awful situations.

So, make life easy for yourself and make the mountpoint generic, and use a directory path with the major version in it for PGDATA.

20 March 2018


Why PostgreSQL is Terrible? at Nordic PGDay

The slides from my talk at Nordic PGDay, Why PostgreSQL is Terrible, are now available.

2 January 2018


A Replication Cheat-Sheet

So many types of replication, so little time! What kind of PostgreSQL replication should you use? Read the rest of this entry »

30 October 2017


Commitment Issues

One of the essentials of any database system is that a transaction is either in progress, committed, or rolled back. But consider what happens if…

UPDATE table SET money=money+100000;
-- And you get an error that the server has disconnected

Did that transaction commit, or didn’t it? You don’t really know, because you don’t know if the server even received the COMMIT statement; if it didn’t, the transaction might have aborted due to the client disconnection.

This isn’t that important for the vast majority of situations, but if you are (for example) building a two-phase commit architecture, it might be vital to know if the transaction really did commit or not.

In PostgreSQL, you can find this out using txid_current() and txid_status():

UPDATE table SET money=money+100000;
SELECT txid_current();

(1 row)
-- And you get an error that the server has disconnected

Now, you can use that result to test the status of that transaction (as long as it is still available in the commit log). So, if a disconnection occurs, the client can reconnect and:

SELECT txid_status(8168);

(1 row)

Now, the client knows it needs to rerun that transaction.

Again, most client applications don’t need this level of detail, but it’s great to have the capability if you do.

27 October 2017


Streaming replication stopped? One more thing to check.

We recently were asked by a client to look at a strange replication situation. Replication had simply stopped from a primary to a streaming replication secondary. Everything looked fine:

After investigation, and some back-and-forth on mailing lists, we discovered the problem was a data ingestion procedure. At the highest level, the stored procedure created a temporary table, filled it, did some data analysis on it, and then inserted the results into a permanent table. Two changes had been made just before the replication stopped:

  1. The number of temporary tables created in each “run” of the stored procedure had been increased from 10 to 20,000.
  2. A bug had been introduced that caused an exception within the loop, but that exception had been caught and discarded.

Logically, what was happening resembled this:

DO $$
  i int := 1;
  FOR i IN 1..20000 LOOP
         PERFORM f();
         i := i / 0;
        WHEN division_by_zero THEN
$$ language plpgsql;

where f() is:

   ON COMMIT DROP AS SELECT i FROM generate_series(1, 100) i;
$$ language plpgsql;

So, what happened?

Each time PostgreSQL creates a temporary table (or any table, for that), it takes an ExclusiveLock on it. That lock is held until the end of the transaction. In this case, each time the exception was raised, that rolled back a savepoint that PL/pgSQL implicitly creates at the start of the BEGIN/END, releasing the lock.

But only on the primary.

The secondary received the lock (one of the only types of lock a secondary can have), and held it until the entire wrapping transaction commits or rolls back. This are a lot of locks in this case. PostgreSQL handles this situation, but not entirely gracefully: the time needed up clean up all those locks when the COMMIT arrives at the secondary appears to be O(N^2), N being the number of locks.

Reducing the number of iterations (and fixing the bug) caused the issue to go away.

Interestingly, we’ve not been able to reproduce in a simple test case the lack-of-error-messages; in the test case above, all kinds of errors are emitted on the secondary.

Special thanks to Tom Lane for helping track this down.

« Older Entries