Re: WAL archiving idle database

Tom Lane <tgl@xxxxxxxxxxxxx> · Fri, 26 Oct 2007 18:47:35 -0400

Jeff Davis <pgsql@xxxxxxxxxxx> writes:
> What's happening is that you have a checkpoint_timeout of 5 minutes, and
> that checkpoint must write a checkpoint record in the WAL, prompting the
> archiving.

> If you want it to happen less frequently, it's often safe to have
> checkpoint timeout set to something larger by a reasonable amount. 

I think you're confusing checkpoint_timeout and archive_timeout...

I seem to recall this behavior having been discussed before, but I
can't find it in the archives right now.  What is happening is that
after each checkpoint_timeout, we test to see if we need to write
a new checkpoint; which is determined by whether anything's been
inserted into WAL since the start of the last checkpoint.  And after
each archive_timeout, we test to see if we need to flush the current WAL
segment out to the archive; which is determined by whether the write
pointer is currently exactly at the start of a segment or not.

Which would be fine, except that the "has anything been inserted
since last checkpoint" test is actually done by seeing if the WAL
insert pointer has moved.  Which it will have, if we did an archive
flush.  And that means that each of these activities makes it look
to the other one like something has happened, and so you get a
checkpoint record every checkpoint_timeout seconds, and then we flush
the entire WAL file (containing only that record), even if the database
is in reality completely idle.  Lather, rinse, repeat.

In the prior discussion that I seem to remember, we didn't think of
a decent solution, and it kinda fell off the radar since zero-activity
isn't too interesting to a lot of folks.  However, chewing on it again
I think I've come up with a good idea that will fix this and actually
simplify the code a bit:

* Add a boolean flag insertedXLog to XLogCtlInsert, which means "at
least one WAL record has been inserted since start of last checkpoint".
Also add a flag completedCkpt somewhere in XLogCtlData, which means
"checkpoint successfully completed"; this second flag is only used by
checkpoint so it can be considered as being protected by the
CheckpointLock.  At startup we can initialize insertedXLog = false,
completedCkpt = true.

* XLogInsert sets insertedXLog to true while holding WALInsertLock,
*except* when inserting either a checkpoint record or an xlog switch
record; in those cases it doesn't change the flag.

* CreateCheckpoint replaces its current rather complex test (lines
5693-5703 in CVS-tip xlog.c) with "if insertedXLog is clear and
completedCkpt is set, we need not checkpoint".  If it does have
to perform a checkpoint, it clears both flags before releasing
WALInsertLock.

* After successful completion of a checkpoint, completedCkpt gets set.

Because insertedXLog is cleared at the same time the checkpoint's REDO
pointer is determined, this will correctly implement the requirement of
detecting whether anything has been inserted since the last REDO point.
This replaces the current indirect test involving comparing the last
checkpoint's REDO pointer to its own address.  However we have to not
set insertedXLog when we finally do insert the checkpoint record, thus
the special case is needed in XLogInsert.  The other special case of
ignoring xlog switch is what's needed to fix the bug, and is obviously
OK because an xlog switch doesn't represent a checkpointable change.

The reason we need the completedCkpt flag is that if a checkpoint fails
partway through, it would nonetheless have cleared insertedXLog, and
we don't want that to cause us to not retry the checkpoint next time.

This is slightly warty but it certainly seems a lot clearer than the
current test in lines 5693-5703.  The couple of lines to be added to
XLogInsert should have negligible performance impact.

Comments?

			regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your
       message can get through to the mailing list cleanly