Re: PG 8.3 and large shared buffer settings

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 26 Sep 2009, Jeff Janes wrote:

On Sat, Sep 26, 2009 at 8:19 AM, Greg Smith <gsmith@xxxxxxxxxxxxx> wrote:

Another problem spot are checkpoints.  If you dirty a very large buffer
cache, that whole thing will have to get dumped to disk eventually, and on
some workloads people have found they have to reduce shared_buffers
specifically to keep this from being too painful.

Is this the case even if checkpoint_completion_target is set close to 1.0?

Sure. checkpoint_completion_target aims to utilize more of the space between each checkpoint by spreading them out over more of that space, but it alone doesn't change the fact that checkpoints are only so long. By default, you're going to get one every five minutes, and on active systems they can come every few seconds if you're not aggressive with increasing checkpoint_segments.

Some quick math gives an idea of the scale of the problem. A single cheap disk can write random I/O (which checkpoints writes often are) at 1-2MB/s; let's call it 100MB/minute. That means that in 5 minutes, a single disk system might be hard pressed to write even 500MB of data out. But you can easily dirty 500MB in seconds nowadays. Now imagine shared_buffers is 40GB and you've dirtied a bunch of it; how long will that take to clear even on a fast RAID system? It won't be quick, and the whole system will grind to a halt at the end of the checkpoint as all the buffered writes queued up are forced out.

If you dirty buffers fast enough to dirty most of a huge shared_buffers area between checkpoints, then it seems like lowering the shared_buffers wouldn't reduce the amount of I/O needed, it would just shift the I/O from checkpoints to the backends themselves.

What's even worse is that backends can be writing data and filling the OS buffer cache in between checkpoints too, but all of that is forced to complete before the checkpoint can finish too. You can easily start the checkpoint process with the whole OS cache filled with backend writes that will slow checkpoint ones if you're not careful.

Because disks are slow, you need to get things that are written to disk as soon as feasible, so the OS has more time to work on them, reorder for efficient writing, etc.

Ultimately, the sooner you get I/O to the OS cache to write, the better, *unless* you're going to write that same block over again before it must go to disk. Normally you want buffers that aren't accessed often to get written out to disk early rather than linger until checkpoint time, there's nothing wrong with a backend doing a write if that block wasn't going to be used again soon. The ideal setup from a latency perspective is that you size shared_buffers just large enough to hold the things you write to regularly, but not so big that it caches every write.

It looks like checkpoint_completion_target was introduced in 8.3.0

Correct. Before then, you had no hope for reducing checkpoint overhead but to use very small settings for shared_buffers, particularly if you cranked the old background writer up so that it wrote lots of redundant information too (that's was the main result of "tuning" it on versions before 8.3 as well).

--
* Greg Smith gsmith@xxxxxxxxxxxxx http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux