On 06/24/2011 02:55 PM, Shaun Thomas wrote:
On 06/24/2011 11:18 AM, Greg Smith wrote:
sync=14525.296 s, total=14786.868 s
Whaaaaaaaaat!? 6% of 8GB is just shy of 500MB. That's not a small
amount, exactly, but it took 14525 seconds to call syncs for those
writes? What kind of ridiculous IO would cause something like that?
That's even way beyond an OS-level dirty buffer flush on a massive
system. Wow!
It is in fact smaller than the write cache on the disk array involved.
The mystery is explained at
http://projects.2ndquadrant.it/sites/default/files/WriteStuff-PGCon2011.pdf
The most relevant part:
-Background writer stop working normally while running sync
-Never pauses to fully consume the fsync queues backends fill
-Once filled, all backend writes do their own fsync
-Serious competition for the checkpoint writes
When the background writer's fsync queue fills, and you have 100 clients
all doing their own writes and making an fsync call after each one of
them (the case on this server), the background writer ends up only
getting around 1/100 of the I/O capabilities of the server available in
its time slice. And that's how a sync phase that might normally take
two minutes on a really busy server ends up running for hours instead.
The improvement in 9.1 gets the individual backends involved in trying
to compact the fsync queue when they discover it is full, which seems to
make the worst case behavior here much better.
--
Greg Smith 2ndQuadrant US greg@xxxxxxxxxxxxxxx Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance