Search Postgresql Archives

Re: checkpoints/bgwriter tuning verification

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 29 Oct 2009, Vick Khera wrote:

Greg, do you have a performance tuning book? If so, I really want to buy it! Your articles are awesome.

Give me a few more months...

basically, the next checkpoint starts within a few seconds of the prior one completing.

That's the expected behavior with a high setting for the completion target. Each checkpoint finishes just before the next one needs to start. That's the most you can spread them out, and more spreading generally equals better behavior. Usually; always exceptions of course.

Based on Greg's article and the above number showing that most
checkpoints are triggered by running out of WAL segments, I should
increase my checkpoint_buffers.  Also, based on the article, I should
increase the bgwriter_lru_maxpages (currently at default 100).

You might actually want to decrease bgwriter_lru_maxpages to 0. The current background writer designs presumes you have some spare I/O capacity and want to trade off some write overhead for possibly lower latency in client backends. If you're at the point where the disks aren't keeping up with the checkpoint disk I/O, this could turn counter-productive.

My plan is to bump checkpoint_segments to 48 and reduce
checkpoint_completion_target to 0.7, and bump the
bgwriter_lru_maxpages to 500.

In your situation, I would increase checkpoint_segments to 64, keep the completion target at 0.9, and decrease bgwriter_lru_maxpages to 0 so that feature is turned off altogether. If you're still not happy, you can try dropping the completion target too; I've heard one report of that logic not working out so well in practice, where lower settings actually resulted in less of an I/O spike. I would worry about the OS tuning first though, which brins us to:

Can the checkpoint operation actually cause the DB to stop responding
for a few seconds at a time?  That seems to be what I observe.
Sometimes for 5 or more seconds one transaction will just stall.

There are two ways that that writes can hang:

1) You've gotten to the point in the checkpoint cycle where it's calling fsync to flush everything out of the filesystem. At this point you could potentially have a big chunk of data that needs to go out to disk, and any other client that needs to write something (or needs a resource locked by a writer) is stuck behind that.

2) The entire OS write cache is filled. Once that happens, if a client tries to write something else, the OS will force it to nap until there's space again to hold that write.

How that all works really depends on the OS. I know that under Linux, the way this is all handled is really bad, and can easily lead to multi-second hangs. Generally you can watch the writes accumulate by looking at /proc/meminfo. I wrote an example showing how the problem appears and what I did to improve long hangs one server at http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html and it refers to a long theory page on Linux's pdflush on my web page.

--
* Greg Smith gsmith@xxxxxxxxxxxxx http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux