Re: Adding more memory = hugh cpu load

Greg Smith <greg@xxxxxxxxxxxxxxx> · Tue, 11 Oct 2011 02:50:46 -0400

On 10/10/2011 12:14 PM, Leonardo Francalanci wrote:

database makes the fsync call, and suddenly the OS wants to flush 2-6GB of data
straight to disk. Without that background trickle, you now have a flood that
only the highest-end disk controller or a backing-store full of SSDs or PCIe
NVRAM could ever hope to absorb.

Isn't checkpoint_completion_target supposed to deal exactly with that problem?

checkpoint_completion_targets spreads out the writes to disk.  
PostgreSQL doesn't make any attempt yet to spread out the sync calls.  
On a busy server, what can happen is that the whole OS write cache fills 
with dirty data--none of which is written out to disk because of the 
high kernel threshold--and then it all slams onto disk fast once the 
checkpoint starts executing sync calls.  Lowering the size of the Linux 
write cache helps with that a lot, but can't quite eliminate the problem.

Plus: if 2-6GB is too much, why not decrease checkpoint_segments? Or
checkpoint_timeout?

Making checkpoints really frequent increases total disk I/O, both to the 
database and to the WAL, significantly.  You don't want to do that if 
there's another way to achieve the same goal without those costs, which 
is what some kernel tuning can do here.  Just need to be careful not to 
go too far; some write caching at the OS level helps a lot, too.

I'm not saying that those kernel parameters are "useless"; I'm saying 
they are used
in  the same way as the checkpoint_segments, checkpoint_timeout and
checkpoint_completion_target are used by postgresql; and on a postgresql-only system
I would rather have postgresql look after the fsync calls, not the OS.

Except that PostgreSQL doesn't look after the fsync calls yet.  I wrote 
a patch for 9.1 that spread out the sync calls, similarly to how the 
writes are spread out now.  I wasn't able to prove an improvement 
sufficient to commit the result.  In the Linux case, the OS has more 
information to work with about how to schedule I/O efficiently given how 
the hardware is acting, and it's not possible for PostgreSQL to know all 
that--not without duplicating a large portion of the kernel development 
work at least.  Right now, relying the kernel means that any 
improvements there magically apply to any PostgreSQL version.  So far 
the results there have been beating out improvements made to the 
database fast enough that it's hard to innovate in this area within 
Postgres.

--
Greg Smith   2ndQuadrant US    greg@xxxxxxxxxxxxxxx   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance