Greg Smith wrote: > Richard Neill wrote: >> Here's the typical checkpoint logs: >> 2009-12-03 06:21:21 GMT LOG: checkpoint complete: wrote 12400 buffers >> (2.2%); 0 transaction log file(s) added, 0 removed, 12 recycled; >> write=149.883 s, sync=5.143 s, total=155.040 s > See that "sync" number there? That's your problem; while that sync > operation is going on, everybody else is grinding to a halt waiting for > it. Not a coincidence that the duration is about the same amount of > time that your queries are getting stuck. This example shows 12400 > buffers = 97MB of total data written. Since those writes are pretty > random I/O, it's easily possible to get stuck for a few seconds waiting > for that much data to make it out to disk. You only gave the write > phase a couple of minutes to spread things out over; meanwhile, Linux > may not even bother starting to write things out until 30 seconds into > that, so the effective time between when writes to disk start and when > the matching sync happens on your system is extremely small. That's not > good--you have to give that several minutes of breathing room if you > want to avoid checkpoint spikes. I wonder how common this issue is? When we implemented spreading of the write phase, we had long discussions about spreading out the fsyncs too, but in the end it wasn't done. Perhaps it is time to revisit that now that 8.3 has been out for some time and people have experience with the load-distributed checkpoints. I'm not sure how the spreading of the fsync()s should work, it's hard to estimate how long each fsync() is going to take, for example, but surely something would be better than nothing. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance