Re: improve wals replay on secondary

Fabio Pardi <f.pardi@xxxxxxxxxxxx> · Fri, 31 May 2019 13:54:26 +0200

On 5/29/19 10:39 AM, Mariel Cherkassky wrote:
> Is there any messages that indicates that the secondary replayed a
> specific wal ? "restored 00000..." means that the restore_command
> succeeded but there isnt any proof that it replayed the wal.

I believe that the message "restored log file..' is an acknowledgement
that the file has been restored and the WAL file applied. At least
that's what i know and understand from the source code.

> 
> My theory regarding the issue : 
>  It seems, that my customer stopped the db 20 minutes after the clone
> have finished.  During those 20 minutes the secondary didnt get enough
> wal records (6 wal files) so it didnt reach the max_wal_size. My
> checkpoint_timeout is set to 30minutes, therefore there wasnt any
> checkpoint. As a result of that the secondary didnt reach a restart
> point. Does that sounds reasonable ?
> 

It could be, but i do not see the point here.

> So basically, if I clone a small primary db, the secondary would reach a
> restart point only if it reached a checkpoint (checkpoint_timeout or
> max_wal_size). However, I have cloned many small dbs and saw the it
> takes a sec to start the secondary (which means that restartpoint was
> reached). So what am I missing ?
> 

The restart point is reached in relation to checkpoints.

But when all available WAL files have been applied, you should be able
to connect to your standby, regardless to when the last checkpoint
occurred, if your standby allows that (hot_standby = on).

regards,

fabio pardi

> ‫בתאריך יום ד׳, 29 במאי 2019 ב-11:20 מאת ‪Fabio Pardi‬‏
> <‪f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>‬‏>:‬
> 
> 
> 
>     On 5/29/19 9:20 AM, Mariel Cherkassky wrote:
>     > First of all thanks Fabio.
>     > I think that I'm missing something : 
>     > In the next questionI'm not talking about streaming replication,rather
>     > on recovery : 
>     >
>     > 1.When the secondary get the wals from the primary it tries to replay
>     > them correct ? 
> 
> 
>     correct
> 
>     >
>     > 2. By replaying it just go over the wal records and run them in the
>     > secondary ?
>     >
> 
>     correct
> 
>     > 3.All those changes are saved in the shared_buffer(secondary) or the
>     > changed are immediately done on the data files blocks ?
>     >
> 
>     the changes are not saved to your datafile yet. That happens at
>     checkpoint time.
> 
>     > 4.The secondary will need a checkpoint in order to flush those changes
>     > to the data files and in order to reach a restart point ?
>     >
> 
>     yes
> 
>     > So, basically If I had a checkpoint during the clone, the secondary
>     > should also have a checkpoint when I recover the secondary right ?
>     >
> 
>     correct. Even after being in sync with master, if you restart Postgres
>     on standby, it will then re-apply the WAL files from the last
>     checkpoint.
> 
>     In the logfile of the standby, you will see as many messages reporting
>     "restored log file" as many WAL files were produced since the last
>     checkpoint
> 
>     Hope it helps to clarify.
> 
>     regards,
> 
>     fabio pardi
>     >
>     > ‫בתאריך יום ג׳, 28 במאי 2019 ב-13:54 מאת ‪Fabio Pardi‬‏
>     > <‪f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>
>     <mailto:f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>>‬‏>:‬
>     >
>     >     Hi Mariel,
>     >
>     >     please keep the list posted. When you reply, use 'reply all'. That
>     >     will maybe help others in the community and you might also get
>     more
>     >     help from others.
>     >
>     >     answers are in line here below
>     >
>     >
>     >
>     >     On 28/05/2019 10:54, Mariel Cherkassky wrote:
>     >     > I have pg 9.6, repmgr version 4.3 .
>     >     > I see in the logs that wal files are restored : 
>     >     > 2019-05-22 12:35:12 EEST  60942  LOG:  restored log file
>     >     "000000010000377B000000DB" from archive
>     >     > that means that the restore_command worked right ? 
>     >     >
>     >
>     >     right
>     >
>     >     > According to the docs :
>     >     > "In standby mode, a restartpoint is also triggered
>     >     if checkpoint_segments log segments have been replayed since last
>     >     restartpoint and at least one checkpoint record has been replayed.
>     >     Restartpoints can't be performed more frequently than
>     checkpoints in
>     >     the master because restartpoints can only be performed at
>     checkpoint
>     >     records"
>     >     > so maybe I should decrease max_wal_size or even
>     checkpoint_timeout
>     >     to force a restartpoint ? 
>     >     > During this gap (standby clone) 6-7 wals were generated on
>     the primary
>     >     >
>     >
>     >
>     >     From what you posted earlier, you should in any case have hit a
>     >     checkpoint every 30 minutes. (That was also the assumption in the
>     >     previous messages. If that's not happening, then i would really
>     >     investigate.)
>     >
>     >     That is, if during your cloning only a few WAL files were
>     generated,
>     >     then it is not enough to trigger a checkpoint and you fallback to
>     >     the 30 minutes.
>     >
>     >     I would not be bothered if i were you, but can always force a
>     >     checkpoint on the master issuing:
>     >
>     >     CHECKPOINT ;
>     >
>     >     at that stage, on the standby logs you will see the messages:
>     >
>     >     restartpoint starting: ..
>     >
>     >     restartpoint complete: ..
>     >
>     >
>     >
>     >     regards,
>     >
>     >     fabio pardi
>     >
>     >     >
>     >     > ‫בתאריך יום ב׳, 27 במאי 2019 ב-14:04 מאת ‪Fabio Pardi‬‏
>     >     <‪f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>
>     <mailto:f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>>
>     >     <mailto:f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>
>     <mailto:f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>>>‬‏>:‬
>     >     >
>     >     >     If you did not even see this messages on your standby logs:
>     >     >
>     >     >     restartpoint starting: xlog
>     >     >
>     >     >     then it means that the checkpoint was even never started.
>     >     >
>     >     >     In that case, I have no clue.
>     >     >
>     >     >     Try to describe step by step how to reproduce the problem
>     >     together with
>     >     >     your setup and the version number of Postgres and
>     repmgr, and
>     >     i might be
>     >     >     able to help you further.
>     >     >
>     >     >     regards,
>     >     >
>     >     >     fabio pardi
>     >     >
>     >     >     On 5/27/19 12:17 PM, Mariel Cherkassky wrote:
>     >     >     > standby_mode = 'on'
>     >     >     > primary_conninfo = 'host=X.X.X.X user=repmgr 
>     >     connect_timeout=10 '
>     >     >     > recovery_target_timeline = 'latest'
>     >     >     > primary_slot_name = repmgr_slot_1
>     >     >     > restore_command = 'rsync -avzhe ssh
>     >     >     > postgres@x.x.x.x:/var/lib/pgsql/archive/%f
>     >     /var/lib/pgsql/archive/%f ;
>     >     >     > gunzip < /var/lib/pgsql/archive/%f > %p'
>     >     >     > archive_cleanup_command =
>     '/usr/pgsql-9.6/bin/pg_archivecleanup
>     >     >     > /var/lib/pgsql/archive %r'
>     >     >     >
>     >     >     > ‫בתאריך יום ב׳, 27 במאי 2019 ב-12:29 מאת ‪Fabio Pardi‬‏
>     >     >     > <‪f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>
>     <mailto:f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>>
>     >     <mailto:f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>
>     <mailto:f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>>>
>     >     <mailto:f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>
>     <mailto:f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>>
>     >     <mailto:f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>
>     <mailto:f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>>>>‬‏>:‬
>     >     >     >
>     >     >     >     Hi Mariel,
>     >     >     >
>     >     >     >     let s keep the list in cc...
>     >     >     >
>     >     >     >     settings look ok.
>     >     >     >
>     >     >     >     what's in the recovery.conf file then?
>     >     >     >
>     >     >     >     regards,
>     >     >     >
>     >     >     >     fabio pardi
>     >     >     >
>     >     >     >     On 5/27/19 11:23 AM, Mariel Cherkassky wrote:
>     >     >     >     > Hey,
>     >     >     >     > the configuration is the same as in the primary : 
>     >     >     >     > max_wal_size = 2GB
>     >     >     >     > min_wal_size = 1GB
>     >     >     >     > wal_buffers = 16MB
>     >     >     >     > checkpoint_completion_target = 0.9
>     >     >     >     > checkpoint_timeout = 30min
>     >     >     >     >
>     >     >     >     > Regarding your question, I didnt see this message
>     >     (consistent recovery
>     >     >     >     > state reached at), I guess thats why the secondary
>     >     isnt avaialble
>     >     >     >     yet..
>     >     >     >     >
>     >     >     >     > Maybe I'm wrong, but what I understood from the
>     >     documentation- restart
>     >     >     >     > point is generated only after the secondary had a
>     >     checkpoint wihch
>     >     >     >     means
>     >     >     >     > only after 30 minutes or after max_wal_size is
>     reached
>     >     ?  But
>     >     >     >     still, why
>     >     >     >     > wont the secondary reach a consisteny recovery state
>     >     (does it
>     >     >     >     requires a
>     >     >     >     > restart point to be generated ? )
>     >     >     >     >
>     >     >     >     >
>     >     >     >     > ‫בתאריך יום ב׳, 27 במאי 2019 ב-12:12 מאת ‪Fabio
>     Pardi‬‏
>     >     >     >     > <‪f.pardi@xxxxxxxxxxxx
>     <mailto:f.pardi@xxxxxxxxxxxx> <mailto:f.pardi@xxxxxxxxxxxx
>     <mailto:f.pardi@xxxxxxxxxxxx>>
>     >     <mailto:f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>
>     <mailto:f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>>>
>     >     <mailto:f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>
>     <mailto:f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>>
>     >     <mailto:f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>
>     <mailto:f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>>>>
>     >     >     >     <mailto:f.pardi@xxxxxxxxxxxx
>     <mailto:f.pardi@xxxxxxxxxxxx>
>     >     <mailto:f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>>
>     <mailto:f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>
>     >     <mailto:f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>>>
>     <mailto:f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>
>     >     <mailto:f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>>
>     <mailto:f.pardi@xxxxxxxxxxxx <mailto:f.pardi@xxxxxxxxxxxx>
>     >     <mailto:f.pardi@xxxxxxxxxxxx
>     <mailto:f.pardi@xxxxxxxxxxxx>>>>>‬‏>:‬
>     >     >     >     >
>     >     >     >     >     Hi Mariel,
>     >     >     >     >
>     >     >     >     >     if i m not wrong, on the secondary you will see
>     >     the messages you
>     >     >     >     >     mentioned when a checkpoint happens.
>     >     >     >     >
>     >     >     >     >     What are checkpoint_timeout and max_wal_size on
>     >     your standby?
>     >     >     >     >
>     >     >     >     >     Did you ever see this on your standby log?
>     >     >     >     >
>     >     >     >     >     "consistent recovery state reached at .."
>     >     >     >     >
>     >     >     >     >
>     >     >     >     >     Maybe you can post your whole configuration of
>     >     your standby
>     >     >     >     for easier
>     >     >     >     >     debug.
>     >     >     >     >
>     >     >     >     >     regards,
>     >     >     >     >
>     >     >     >     >     fabio pardi
>     >     >     >     >
>     >     >     >     >
>     >     >     >     >
>     >     >     >     >
>     >     >     >     >     On 5/27/19 10:49 AM, Mariel Cherkassky wrote:
>     >     >     >     >     > Hey,
>     >     >     >     >     > PG 9.6, I have a standalone configured. I
>     tried
>     >     to start up a
>     >     >     >     >     secondary,
>     >     >     >     >     > run standby clone (repmgr). The clone process
>     >     took 3 hours
>     >     >     >     and during
>     >     >     >     >     > that time wals were generated(mostly
>     because of the
>     >     >     >     >     checkpoint_timeout).
>     >     >     >     >     > As a result of that, when I start the
>     secondary
>     >     ,I see that the
>     >     >     >     >     > secondary keeps getting the wals but I
>     dont see
>     >     any messages
>     >     >     >     that
>     >     >     >     >     > indicate that the secondary tried to
>     replay the
>     >     wals. 
>     >     >     >     >     > messages that i see :
>     >     >     >     >     > receiving incremental file list
>     >     >     >     >     > 000000010000377B000000DE
>     >     >     >     >     >
>     >     >     >     >     > sent 30 bytes  received 4.11M bytes  8.22M
>     bytes/sec
>     >     >     >     >     > total size is 4.15M  speedup is 1.01
>     >     >     >     >     > 2019-05-22 12:48:10 EEST  60942  LOG:
>      restored
>     >     log file
>     >     >     >     >     > "000000010000377B000000DE" from archive
>     >     >     >     >     > 2019-05-22 12:48:11 EEST db63311  FATAL:  the
>     >     database system is
>     >     >     >     >     starting up
>     >     >     >     >     > 2019-05-22 12:48:12 EEST db63313  FATAL:  the
>     >     database system is
>     >     >     >     >     > starting up 
>     >     >     >     >     >
>     >     >     >     >     > I was hoping to see the following messages
>     >     (taken from a
>     >     >     >     different
>     >     >     >     >     > machine) : 
>     >     >     >     >     > 2019-05-27 01:15:37 EDT  7428  LOG:
>     >      restartpoint starting: time
>     >     >     >     >     > 2019-05-27 01:16:18 EDT  7428  LOG:
>     >      restartpoint complete:
>     >     >     >     wrote 406
>     >     >     >     >     > buffers (0.2%); 1 transaction log file(s)
>     added,
>     >     0 removed, 0
>     >     >     >     >     recycled;
>     >     >     >     >     > write=41.390 s, sync=0.001 s, total=41.582 s;
>     >     sync file
>     >     >     >     >     > s=128, longest=0.000 s, average=0.000 s;
>     >     distance=2005 kB,
>     >     >     >     >     estimate=2699 kB
>     >     >     >     >     > 2019-05-27 01:16:18 EDT  7428  LOG:  recovery
>     >     restart point at
>     >     >     >     >     4/D096C4F8
>     >     >     >     >     >
>     >     >     >     >     > My primary settings(wals settings) : 
>     >     >     >     >     > wal_buffers = 16MB
>     >     >     >     >     > checkpoint_completion_target = 0.9
>     >     >     >     >     > checkpoint_timeout = 30min
>     >     >     >     >     >
>     >     >     >     >     > Any idea what can explain why the secondary
>     >     doesnt replay
>     >     >     >     the wals ?
>     >     >     >     >
>     >     >     >     >
>     >     >     >
>     >     >
>     >
>