Re: improve wals replay on secondary

Mariel Cherkassky <mariel.cherkassky@xxxxxxxxx> · Mon, 27 May 2019 13:17:49 +0300

standby_mode = 'on'
primary_conninfo = 'host=X.X.X.X user=repmgr  connect_timeout=10 '
recovery_target_timeline = 'latest'
primary_slot_name = repmgr_slot_1
restore_command = 'rsync -avzhe ssh postgres@x.x.x.x:/var/lib/pgsql/archive/%f /var/lib/pgsql/archive/%f ; gunzip < /var/lib/pgsql/archive/%f > %p'
archive_cleanup_command = '/usr/pgsql-9.6/bin/pg_archivecleanup /var/lib/pgsql/archive %r'

‫בתאריך יום ב׳, 27 במאי 2019 ב-12:29 מאת ‪Fabio Pardi‬‏ <‪f.pardi@xxxxxxxxxxxx‬‏>:‬
Hi Mariel,

let s keep the list in cc...

settings look ok.

what's in the recovery.conf file then?

regards,

fabio pardi

On 5/27/19 11:23 AM, Mariel Cherkassky wrote:

> Hey,

> the configuration is the same as in the primary : 

> max_wal_size = 2GB

> min_wal_size = 1GB

> wal_buffers = 16MB

> checkpoint_completion_target = 0.9

> checkpoint_timeout = 30min

> 

> Regarding your question, I didnt see this message (consistent recovery

> state reached at), I guess thats why the secondary isnt avaialble yet..

> 

> Maybe I'm wrong, but what I understood from the documentation- restart

> point is generated only after the secondary had a checkpoint wihch means

> only after 30 minutes or after max_wal_size is reached ?  But still, why

> wont the secondary reach a consisteny recovery state (does it requires a

> restart point to be generated ? )

> 

> 

> ‫בתאריך יום ב׳, 27 במאי 2019 ב-12:12 מאת ‪Fabio Pardi‬‏

> <‪f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>‬‏>:‬

> 

>     Hi Mariel,

> 

>     if i m not wrong, on the secondary you will see the messages you

>     mentioned when a checkpoint happens.

> 

>     What are checkpoint_timeout and max_wal_size on your standby?

> 

>     Did you ever see this on your standby log?

> 

>     "consistent recovery state reached at .."

> 

> 

>     Maybe you can post your whole configuration of your standby for easier

>     debug.

> 

>     regards,

> 

>     fabio pardi

> 

> 

> 

> 

>     On 5/27/19 10:49 AM, Mariel Cherkassky wrote:

>     > Hey,

>     > PG 9.6, I have a standalone configured. I tried to start up a

>     secondary,

>     > run standby clone (repmgr). The clone process took 3 hours and during

>     > that time wals were generated(mostly because of the

>     checkpoint_timeout).

>     > As a result of that, when I start the secondary ,I see that the

>     > secondary keeps getting the wals but I dont see any messages that

>     > indicate that the secondary tried to replay the wals. 

>     > messages that i see :

>     > receiving incremental file list

>     > 000000010000377B000000DE

>     >

>     > sent 30 bytes  received 4.11M bytes  8.22M bytes/sec

>     > total size is 4.15M  speedup is 1.01

>     > 2019-05-22 12:48:10 EEST  60942  LOG:  restored log file

>     > "000000010000377B000000DE" from archive

>     > 2019-05-22 12:48:11 EEST db63311  FATAL:  the database system is

>     starting up

>     > 2019-05-22 12:48:12 EEST db63313  FATAL:  the database system is

>     > starting up 

>     >

>     > I was hoping to see the following messages (taken from a different

>     > machine) : 

>     > 2019-05-27 01:15:37 EDT  7428  LOG:  restartpoint starting: time

>     > 2019-05-27 01:16:18 EDT  7428  LOG:  restartpoint complete: wrote 406

>     > buffers (0.2%); 1 transaction log file(s) added, 0 removed, 0

>     recycled;

>     > write=41.390 s, sync=0.001 s, total=41.582 s; sync file

>     > s=128, longest=0.000 s, average=0.000 s; distance=2005 kB,

>     estimate=2699 kB

>     > 2019-05-27 01:16:18 EDT  7428  LOG:  recovery restart point at

>     4/D096C4F8

>     >

>     > My primary settings(wals settings) : 

>     > wal_buffers = 16MB

>     > checkpoint_completion_target = 0.9

>     > checkpoint_timeout = 30min

>     >

>     > Any idea what can explain why the secondary doesnt replay the wals ?

> 

>