Re: improve wals replay on secondary

Mariel Cherkassky <mariel.cherkassky@xxxxxxxxx> · Wed, 29 May 2019 14:10:59 +0300

In my case, I'm talking about creating a standby -> standby clone and then recover the wals . How can I run checkpoint in the secondary if it isnt up ?

‫בתאריך יום ד׳, 29 במאי 2019 ב-13:54 מאת ‪Nikolay Samokhvalov‬‏ <‪samokhvalov@xxxxxxxxx‬‏>:‬
Hello Mariel,

1) Have you tried to run “CHECKPOINT;” on the standby to perform restartpoint explicitly? It is possible.

2) If we talk about streaming replication, do you use replication slots? If so, have you checked pg_replication_slots and pg_stat_replication? They can help to troubleshoot streaming replication delays — see columns sent_location, write_location, flush_location, and replay_location in pg_stat_replication and restart_lsn in pg_replication_slots. If you have delay in replaying, it should be seen there.

On Wed, May 29, 2019 at 11:39 Mariel Cherkassky <mariel.cherkassky@xxxxxxxxx> wrote:
Is there any messages that indicates that the secondary replayed a specific wal ? "restored 00000..." means that the restore_command succeeded but there isnt any proof that it replayed the wal.

My theory regarding the issue : 
 It seems, that my customer stopped the db 20 minutes after the clone have finished.  During those 20 minutes the secondary didnt get enough wal records (6 wal files) so it didnt reach the max_wal_size. My checkpoint_timeout is set to 30minutes, therefore there wasnt any checkpoint. As a result of that the secondary didnt reach a restart point. Does that sounds reasonable ?

So basically, if I clone a small primary db, the secondary would reach a restart point only if it reached a checkpoint (checkpoint_timeout or max_wal_size). However, I have cloned many small dbs and saw the it takes a sec to start the secondary (which means that restartpoint was reached). So what am I missing ?

‫בתאריך יום ד׳, 29 במאי 2019 ב-11:20 מאת ‪Fabio Pardi‬‏ <‪f.pardi@xxxxxxxxxxxx‬‏>:‬

On 5/29/19 9:20 AM, Mariel Cherkassky wrote:

> First of all thanks Fabio.

> I think that I'm missing something : 

> In the next questionI'm not talking about streaming replication,rather

> on recovery : 

> 

> 1.When the secondary get the wals from the primary it tries to replay

> them correct ? 

correct

> 

> 2. By replaying it just go over the wal records and run them in the

> secondary ?

> 

correct

> 3.All those changes are saved in the shared_buffer(secondary) or the

> changed are immediately done on the data files blocks ?

> 

the changes are not saved to your datafile yet. That happens at

checkpoint time.

> 4.The secondary will need a checkpoint in order to flush those changes

> to the data files and in order to reach a restart point ?

> 

yes

> So, basically If I had a checkpoint during the clone, the secondary

> should also have a checkpoint when I recover the secondary right ?

> 

correct. Even after being in sync with master, if you restart Postgres

on standby, it will then re-apply the WAL files from the last checkpoint.

In the logfile of the standby, you will see as many messages reporting

"restored log file" as many WAL files were produced since the last

checkpoint

Hope it helps to clarify.

regards,

fabio pardi

> 

> ‫בתאריך יום ג׳, 28 במאי 2019 ב-13:54 מאת ‪Fabio Pardi‬‏

> <‪f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>‬‏>:‬

> 

>     Hi Mariel,

> 

>     please keep the list posted. When you reply, use 'reply all'. That

>     will maybe help others in the community and you might also get more

>     help from others.

> 

>     answers are in line here below

> 

> 

> 

>     On 28/05/2019 10:54, Mariel Cherkassky wrote:

>     > I have pg 9.6, repmgr version 4.3 .

>     > I see in the logs that wal files are restored : 

>     > 2019-05-22 12:35:12 EEST  60942  LOG:  restored log file

>     "000000010000377B000000DB" from archive

>     > that means that the restore_command worked right ? 

>     >

> 

>     right

> 

>     > According to the docs :

>     > "In standby mode, a restartpoint is also triggered

>     if checkpoint_segments log segments have been replayed since last

>     restartpoint and at least one checkpoint record has been replayed.

>     Restartpoints can't be performed more frequently than checkpoints in

>     the master because restartpoints can only be performed at checkpoint

>     records"

>     > so maybe I should decrease max_wal_size or even checkpoint_timeout

>     to force a restartpoint ? 

>     > During this gap (standby clone) 6-7 wals were generated on the primary

>     >

> 

> 

>     From what you posted earlier, you should in any case have hit a

>     checkpoint every 30 minutes. (That was also the assumption in the

>     previous messages. If that's not happening, then i would really

>     investigate.)

> 

>     That is, if during your cloning only a few WAL files were generated,

>     then it is not enough to trigger a checkpoint and you fallback to

>     the 30 minutes.

> 

>     I would not be bothered if i were you, but can always force a

>     checkpoint on the master issuing:

> 

>     CHECKPOINT ;

> 

>     at that stage, on the standby logs you will see the messages:

> 

>     restartpoint starting: ..

> 

>     restartpoint complete: ..

> 

> 

> 

>     regards,

> 

>     fabio pardi

> 

>     >

>     > ‫בתאריך יום ב׳, 27 במאי 2019 ב-14:04 מאת ‪Fabio Pardi‬‏

>     <‪f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>

>     <mailto:f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>>‬‏>:‬

>     >

>     >     If you did not even see this messages on your standby logs:

>     >

>     >     restartpoint starting: xlog

>     >

>     >     then it means that the checkpoint was even never started.

>     >

>     >     In that case, I have no clue.

>     >

>     >     Try to describe step by step how to reproduce the problem

>     together with

>     >     your setup and the version number of Postgres and repmgr, and

>     i might be

>     >     able to help you further.

>     >

>     >     regards,

>     >

>     >     fabio pardi

>     >

>     >     On 5/27/19 12:17 PM, Mariel Cherkassky wrote:

>     >     > standby_mode = 'on'

>     >     > primary_conninfo = 'host=X.X.X.X user=repmgr 

>     connect_timeout=10 '

>     >     > recovery_target_timeline = 'latest'

>     >     > primary_slot_name = repmgr_slot_1

>     >     > restore_command = 'rsync -avzhe ssh

>     >     > postgres@x.x.x.x:/var/lib/pgsql/archive/%f

>     /var/lib/pgsql/archive/%f ;

>     >     > gunzip < /var/lib/pgsql/archive/%f > %p'

>     >     > archive_cleanup_command = '/usr/pgsql-9.6/bin/pg_archivecleanup

>     >     > /var/lib/pgsql/archive %r'

>     >     >

>     >     > ‫בתאריך יום ב׳, 27 במאי 2019 ב-12:29 מאת ‪Fabio Pardi‬‏

>     >     > <‪f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>

>     <mailto:f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>>

>     <mailto:f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>

>     <mailto:f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>>>‬‏>:‬

>     >     >

>     >     >     Hi Mariel,

>     >     >

>     >     >     let s keep the list in cc...

>     >     >

>     >     >     settings look ok.

>     >     >

>     >     >     what's in the recovery.conf file then?

>     >     >

>     >     >     regards,

>     >     >

>     >     >     fabio pardi

>     >     >

>     >     >     On 5/27/19 11:23 AM, Mariel Cherkassky wrote:

>     >     >     > Hey,

>     >     >     > the configuration is the same as in the primary : 

>     >     >     > max_wal_size = 2GB

>     >     >     > min_wal_size = 1GB

>     >     >     > wal_buffers = 16MB

>     >     >     > checkpoint_completion_target = 0.9

>     >     >     > checkpoint_timeout = 30min

>     >     >     >

>     >     >     > Regarding your question, I didnt see this message

>     (consistent recovery

>     >     >     > state reached at), I guess thats why the secondary

>     isnt avaialble

>     >     >     yet..

>     >     >     >

>     >     >     > Maybe I'm wrong, but what I understood from the

>     documentation- restart

>     >     >     > point is generated only after the secondary had a

>     checkpoint wihch

>     >     >     means

>     >     >     > only after 30 minutes or after max_wal_size is reached

>     ?  But

>     >     >     still, why

>     >     >     > wont the secondary reach a consisteny recovery state

>     (does it

>     >     >     requires a

>     >     >     > restart point to be generated ? )

>     >     >     >

>     >     >     >

>     >     >     > ‫בתאריך יום ב׳, 27 במאי 2019 ב-12:12 מאת ‪Fabio Pardi‬‏

>     >     >     > <‪f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>

>     <mailto:f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>>

>     <mailto:f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>

>     <mailto:f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>>>

>     >     >     <mailto:f.pardi@xxxxxxxxxxxx

>     <mailto:f.pardi@xxxxxxxxxxxx> <mailto:f.pardi@xxxxxxxxxxxx

>     <mailto:f.pardi@xxxxxxxxxxxx>> <mailto:f.pardi@xxxxxxxxxxxx

>     <mailto:f.pardi@xxxxxxxxxxxx> <mailto:f.pardi@xxxxxxxxxxxx

>     <mailto:f.pardi@xxxxxxxxxxxx>>>>‬‏>:‬

>     >     >     >

>     >     >     >     Hi Mariel,

>     >     >     >

>     >     >     >     if i m not wrong, on the secondary you will see

>     the messages you

>     >     >     >     mentioned when a checkpoint happens.

>     >     >     >

>     >     >     >     What are checkpoint_timeout and max_wal_size on

>     your standby?

>     >     >     >

>     >     >     >     Did you ever see this on your standby log?

>     >     >     >

>     >     >     >     "consistent recovery state reached at .."

>     >     >     >

>     >     >     >

>     >     >     >     Maybe you can post your whole configuration of

>     your standby

>     >     >     for easier

>     >     >     >     debug.

>     >     >     >

>     >     >     >     regards,

>     >     >     >

>     >     >     >     fabio pardi

>     >     >     >

>     >     >     >

>     >     >     >

>     >     >     >

>     >     >     >     On 5/27/19 10:49 AM, Mariel Cherkassky wrote:

>     >     >     >     > Hey,

>     >     >     >     > PG 9.6, I have a standalone configured. I tried

>     to start up a

>     >     >     >     secondary,

>     >     >     >     > run standby clone (repmgr). The clone process

>     took 3 hours

>     >     >     and during

>     >     >     >     > that time wals were generated(mostly because of the

>     >     >     >     checkpoint_timeout).

>     >     >     >     > As a result of that, when I start the secondary

>     ,I see that the

>     >     >     >     > secondary keeps getting the wals but I dont see

>     any messages

>     >     >     that

>     >     >     >     > indicate that the secondary tried to replay the

>     wals. 

>     >     >     >     > messages that i see :

>     >     >     >     > receiving incremental file list

>     >     >     >     > 000000010000377B000000DE

>     >     >     >     >

>     >     >     >     > sent 30 bytes  received 4.11M bytes  8.22M bytes/sec

>     >     >     >     > total size is 4.15M  speedup is 1.01

>     >     >     >     > 2019-05-22 12:48:10 EEST  60942  LOG:  restored

>     log file

>     >     >     >     > "000000010000377B000000DE" from archive

>     >     >     >     > 2019-05-22 12:48:11 EEST db63311  FATAL:  the

>     database system is

>     >     >     >     starting up

>     >     >     >     > 2019-05-22 12:48:12 EEST db63313  FATAL:  the

>     database system is

>     >     >     >     > starting up 

>     >     >     >     >

>     >     >     >     > I was hoping to see the following messages

>     (taken from a

>     >     >     different

>     >     >     >     > machine) : 

>     >     >     >     > 2019-05-27 01:15:37 EDT  7428  LOG:

>      restartpoint starting: time

>     >     >     >     > 2019-05-27 01:16:18 EDT  7428  LOG:

>      restartpoint complete:

>     >     >     wrote 406

>     >     >     >     > buffers (0.2%); 1 transaction log file(s) added,

>     0 removed, 0

>     >     >     >     recycled;

>     >     >     >     > write=41.390 s, sync=0.001 s, total=41.582 s;

>     sync file

>     >     >     >     > s=128, longest=0.000 s, average=0.000 s;

>     distance=2005 kB,

>     >     >     >     estimate=2699 kB

>     >     >     >     > 2019-05-27 01:16:18 EDT  7428  LOG:  recovery

>     restart point at

>     >     >     >     4/D096C4F8

>     >     >     >     >

>     >     >     >     > My primary settings(wals settings) : 

>     >     >     >     > wal_buffers = 16MB

>     >     >     >     > checkpoint_completion_target = 0.9

>     >     >     >     > checkpoint_timeout = 30min

>     >     >     >     >

>     >     >     >     > Any idea what can explain why the secondary

>     doesnt replay

>     >     >     the wals ?

>     >     >     >

>     >     >     >

>     >     >

>     >

>