Search Postgresql Archives

Re: Limit of bgwriter_lru_maxpages of max. 1000?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2 Oct 2009, Scott Marlowe wrote:

I found that lowering checkpoint completion target was what helped.
Does that seem counter-intuitive to you?

Generally, but there are plenty of ways you can get into a state where a short but not immediate checkpoint is better. For example, consider a case where your buffer cache is filled with really random stuff. There's a sorting horizon in effect, where your OS and/or controller makes decisions about what order to write things based on the data it already has around, not really knowing what's coming in the near future.

Let's say you've got 256MB of cache in the disk controller, you have 1GB of buffer cache to write out, and there's 8GB of RAM in the server so it can cache the whole write. If you wrote it out in a big burst, the OS would elevator sort things and feed them to the controller in disk order. Very efficient, one pass over the disk to write everything out.

But if you broke that up into 256MB write pieces instead on the database side, pausing after each chunk was written, the OS would only be sorting across 256MB at a time, and would basically fill the controller cache up with that before it saw the larger picture. The disk controller can end up making seek decisions with that small of a planning window now that are not really optimal, making more passes over the disk to write the same data out. If the timing between the DB write cache and the OS is pathologically out of sync here, the result can end up being slower than had you just written out in bigger chunks instead. This is one reason I'd like to see fsync calls happen earlier and more evenly than they do now, to reduce these edge cases.

The usual approach I take in this situation is to reduce the amount of write caching the OS does, so at least things get more predictable. A giant write cache always gives the best average performance, but the worst-case behavior increases at the same time.

There was a patch floating around at one point that sorted all the checkpoint writes by block order, which would reduce how likely it is you'll end up in one of these odd cases. That turned out to be hard to nail down the benefit of though, because in a typical case the OS caching here trumps any I/O scheduling you try to do in user land, and it's hard to repeatibly generate scattered data in a benchmark situation.

--
* Greg Smith gsmith@xxxxxxxxxxxxx http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux