Re: very long secondary->primary switch time

Tomas Pospisek <tpo2@xxxxxxxxxxxxx> · Wed, 28 Apr 2021 08:03:56 +0200

Hi Tom, hi list participants,

thanks a lot for replying Tom.

On 27.04.21 22:16, Tom Lane wrote:
Tomas Pospisek <tpo2@xxxxxxxxxxxxx> writes:
I maintain a postgresql cluster that does failover via patroni. The
problem is that after a failover happens it takes the secondary too long
(that is about 35min) to come up and answer queries. The log of the
secondary looks like this:

04:00:29.777 [9679] LOG:  received promote request
04:00:29.780 [9693] FATAL:  terminating walreceiver process due to
administrator command
04:00:29.780 [9679] LOG:  invalid record length at 320/B95A1EE0: wanted
24, got 0
04:00:29.783 [9679] LOG:  redo done at 320/B95A1EA8
04:00:29.783 [9679] LOG:  last completed transaction was at log time
2021-03-03 03:57:46.466342+01

04:35:00.982 [9679] LOG:  selected new timeline ID: 15
04:35:01.404 [9679] LOG:  archive recovery complete
04:35:02.337 [9662] LOG:  database system is ready to accept connections

The cluster is "fairly large" with thousands of DBs (sic!) and ~1TB of data.

Hm.  WAL replay is already done at the "redo done" entry.  There is a
checkpoint after that, I believe, and there may be some effort to search
for dead files as well.  Still, if your I/O subsystem is better than
a wet noodle, 35 minutes is a long time to finish that.

One thing I'm not sure about is whether we try to do the checkpoint
at maximum speed.  If you have set GUC options to throttle checkpoint
I/O hard, that could perhaps explain this.

I didn't do much customization of postgresql settings and am mostly 
running what was provided by the upstreams. These are my explicit 
settings (all the rest is defaults):

    allow_system_table_mods: 'off'
    autovacuum: 'off'
    checkpoint_completion_target: '0.7'
    effective_cache_size: 16 GB
    maintenance_work_mem: 1 GB
    max_connections: 200
    max_wal_size: 16 GB
    shared_buffers: 8 GB
    shared_preload_libraries: pg_stat_statements
    track_activities: 'on'
    track_activity_query_size: 32 kB
    track_functions: all
    wal_keep_segments: 100
    work_mem: 64 MB

You could possibly learn more by strace'ing the startup process to
see what it's doing.

Will do, thanks, however I'm dreading the next failover downtime :-(

Also, what PG version is that exactly?

12.6-1.pgdg20.04+1

Thankful for any help or pointers regarding the long promotion time,
*t