On 10/10/2011 12:14 PM, Leonardo Francalanci wrote:
database makes the fsync call, and suddenly the OS wants to flush 2-6GB of data
straight to disk. Without that background trickle, you now have a flood that
only the highest-end disk controller or a backing-store full of SSDs or PCIe
NVRAM could ever hope to absorb.
Isn't checkpoint_completion_target supposed to deal exactly with that problem?
checkpoint_completion_targets spreads out the writes to disk.
PostgreSQL doesn't make any attempt yet to spread out the sync calls.
On a busy server, what can happen is that the whole OS write cache fills
with dirty data--none of which is written out to disk because of the
high kernel threshold--and then it all slams onto disk fast once the
checkpoint starts executing sync calls. Lowering the size of the Linux
write cache helps with that a lot, but can't quite eliminate the problem.
Plus: if 2-6GB is too much, why not decrease checkpoint_segments? Or
checkpoint_timeout?
Making checkpoints really frequent increases total disk I/O, both to the
database and to the WAL, significantly. You don't want to do that if
there's another way to achieve the same goal without those costs, which
is what some kernel tuning can do here. Just need to be careful not to
go too far; some write caching at the OS level helps a lot, too.
I'm not saying that those kernel parameters are "useless"; I'm saying
they are used
in the same way as the checkpoint_segments, checkpoint_timeout and
checkpoint_completion_target are used by postgresql; and on a postgresql-only system
I would rather have postgresql look after the fsync calls, not the OS.
Except that PostgreSQL doesn't look after the fsync calls yet. I wrote
a patch for 9.1 that spread out the sync calls, similarly to how the
writes are spread out now. I wasn't able to prove an improvement
sufficient to commit the result. In the Linux case, the OS has more
information to work with about how to schedule I/O efficiently given how
the hardware is acting, and it's not possible for PostgreSQL to know all
that--not without duplicating a large portion of the kernel development
work at least. Right now, relying the kernel means that any
improvements there magically apply to any PostgreSQL version. So far
the results there have been beating out improvements made to the
database fast enough that it's hard to innovate in this area within
Postgres.
--
Greg Smith 2ndQuadrant US greg@xxxxxxxxxxxxxxx Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance