Re: Slow WAL recovery for DROP TABLE

Sherrylyn Branchaw <sbranchaw@xxxxxxxxx> · Wed, 18 Jul 2018 13:11:56 -0400

> There was a recent commit for a similar performance problem, which will appear in 9.6.10.  But that was specifically for cases where there were multiple dropped tables per transaction, and large shared_buffers.
Interesting, and good to know, thanks! I'm not sure we fall under either (is 8 GB large? It's larger than the default, but I always thought large was defined as "more than 8GB" for this setting), but it sounds like this sort of problem is on the developers' radars. It's possible up to 4 tables were dropped per transaction in prod, but I don't know if that's enough to count as "multiple", and in testing, I reproduced the problem with 1 drop per transaction.

> I can't reproduce your single-drop-per-transaction problem.  The replica has no problem keeping up with the master.

It's possible that the problem only occurs when the replica is on inferior hardware. I was unable to test equal servers in the time I had. I noticed that when the superior server was the replica, it was able to keep up with the inferior replica, but that dropping tables was the only action for which the inferior server wasn't able to keep up with as a standby, and the only action for which the standalone outperformed the replica. I did not test truncates; it's possible I would have seen the same problem with it.

> Can you share the reproduction scripts

For the table drops, I prepped by running these:

CREATE TABLE IF NOT EXISTS test1 (id int);
CREATE TABLE IF NOT EXISTS test2 (id int);
CREATE TABLE IF NOT EXISTS test3 (id int);
...
CREATE TABLE IF NOT EXISTS test100000 (id int);

Then I dropped with these:

DROP TABLE IF EXISTS test1;
DROP TABLE IF EXISTS test2;
DROP TABLE IF EXISTS test3;
...
DROP TABLE IF EXISTS test100000;

For the inserts, I did a

CREATE TABLE test (content1 TEXT, content2 TEXT, content3 TEXT);

followed by 300 of these statements, which insert a random string into each of the three columns, in batches of 10000 rows:

INSERT INTO test SELECT array_to_string(ARRAY(SELECT chr((48 + round(random() * 9)) :: integer) FROM generate_series(1,135)), ''), array_to_string(ARRAY(SELECT chr((48 + round(random() * 9)) :: integer) FROM generate_series(1,135)), ''), array_to_string(ARRAY(SELECT chr((48 + round(random() * 9)) :: integer) FROM generate_series(1,135)), '') FROM (SELECT generate_series(1,10000)) foo;

I loaded and timed each script as a file using 

time psql -d mydb -f drop_tables.sql

> any non-default config settings?  Especially the setting of shared_buffers (on both master and replica, if different)

Our shared_buffers setting is 8 GB on all nodes.

In prod, we had a primary and two standbys. One standby was very similar to the primary in hardware, but older and with fewer CPUs. The other standby was far inferior in hardware. For testing, I used the two standbys, and was able to reproduce when I made the superior standby the primary.

These are the non-default settings on the primary and the standby of comparable hardware (other than changing file and directory paths, which shouldn't affect anything):

listen_addresses = '*'
log_destination = 'stderr, syslog'
log_filename = 'postgresql-%Y-%m-%d.log'
log_line_prefix = '%p [%m]: %u %d %q %h %a %v %x'
syslog_facility = 'local1'
log_checkpoints = on
log_connections = on
log_disconnections = on
log_lock_waits = on
log_temp_files = 0
log_timezone = 'US/Central'
log_statement = 'ddl'
track_functions = pl
track_activity_query_size = 4096
timezone = US/Central
client_encoding = UTF8
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
max_connections = 1024
tcp_keepalives_idle = 7200
tcp_keepalives_interval = 75
tcp_keepalives_count = 9
work_mem                        = '48 MB'
maintenance_work_mem            = '1 GB'
max_locks_per_transaction = 8192
random_page_cost = 2.0
effective_cache_size = '94GB'
log_autovacuum_min_duration = 10s
autovacuum_naptime = 2min
autovacuum_vacuum_threshold = 500
autovacuum_analyze_threshold = 250
autovacuum_freeze_max_age = 1000000000
autovacuum_vacuum_cost_delay = 10ms
autovacuum_vacuum_scale_factor = 0.01
autovacuum_analyze_scale_factor = 0.01
wal_buffers = 16MB
bgwriter_lru_maxpages = 250
max_wal_senders = 5
wal_keep_segments = 256
hot_standby = on
log_min_duration_statement = 2s
wal_level = replica
wal_log_hints = on
archive_mode = on
archive_command = 'ssh postgres@backupserver "test ! -f /opt/backup/db1/%f" && rsync -q %p postgres@backupserver:/opt/backup/db1/%f'
max_parallel_workers_per_gather = 4
max_worker_processes = 16

The only different setting on the second standby is effective_cache_size, which is 24 GB instead of 94 GB.

To rule out fetching from the remote archive as the bottleneck, I tried scp-ing a bunch of WAL files to the standby and resetting the restore command to replay from the local archive. Same performance problem, and only when dropping tables.

Best,
Sherrylyn