On Thu, 29 Oct 2009, Vick Khera wrote:
Greg, do you have a performance tuning book? If so, I really want to
buy it! Your articles are awesome.
Give me a few more months...
basically, the next checkpoint starts within a few seconds of the prior
one completing.
That's the expected behavior with a high setting for the completion
target. Each checkpoint finishes just before the next one needs to start.
That's the most you can spread them out, and more spreading generally
equals better behavior. Usually; always exceptions of course.
Based on Greg's article and the above number showing that most
checkpoints are triggered by running out of WAL segments, I should
increase my checkpoint_buffers. Also, based on the article, I should
increase the bgwriter_lru_maxpages (currently at default 100).
You might actually want to decrease bgwriter_lru_maxpages to 0. The
current background writer designs presumes you have some spare I/O
capacity and want to trade off some write overhead for possibly lower
latency in client backends. If you're at the point where the disks aren't
keeping up with the checkpoint disk I/O, this could turn
counter-productive.
My plan is to bump checkpoint_segments to 48 and reduce
checkpoint_completion_target to 0.7, and bump the
bgwriter_lru_maxpages to 500.
In your situation, I would increase checkpoint_segments to 64, keep the
completion target at 0.9, and decrease bgwriter_lru_maxpages to 0 so that
feature is turned off altogether. If you're still not happy, you can try
dropping the completion target too; I've heard one report of that logic
not working out so well in practice, where lower settings actually
resulted in less of an I/O spike. I would worry about the OS tuning first
though, which brins us to:
Can the checkpoint operation actually cause the DB to stop responding
for a few seconds at a time? That seems to be what I observe.
Sometimes for 5 or more seconds one transaction will just stall.
There are two ways that that writes can hang:
1) You've gotten to the point in the checkpoint cycle where it's calling
fsync to flush everything out of the filesystem. At this point you could
potentially have a big chunk of data that needs to go out to disk, and any
other client that needs to write something (or needs a resource locked by
a writer) is stuck behind that.
2) The entire OS write cache is filled. Once that happens, if a client
tries to write something else, the OS will force it to nap until there's
space again to hold that write.
How that all works really depends on the OS. I know that under Linux, the
way this is all handled is really bad, and can easily lead to multi-second
hangs. Generally you can watch the writes accumulate by looking at
/proc/meminfo. I wrote an example showing how the problem appears and
what I did to improve long hangs one server at
http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html and
it refers to a long theory page on Linux's pdflush on my web page.
--
* Greg Smith gsmith@xxxxxxxxxxxxx http://www.gregsmith.com Baltimore, MD
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general