Re: Slow Query / Check Point Segments

Greg Smith <greg@xxxxxxxxxxxxxxx> · Fri, 22 Jan 2010 21:11:18 -0500

John R Pierce wrote:
we're having a similar problem with a very update intensive database 
that is part of a 24/7 manufacturing operation (no breathing time 
unless there's an unusual line down situtation)

Your problem is actually a bit different.

2010-01-23 01:08:13 MYTLOG:  checkpoint complete: wrote 7535 buffers 
(2.9%); 0 transaction log file(s) added, 0 removed, 20 recycled; 
write=86.510 s, sync=0.260 s, total=86.791 s
2010-01-23 01:13:22 MYTLOG:  checkpoint complete: wrote 12786 buffers 
(4.9%); 0 transaction log file(s) added, 0 removed, 17 recycled; 
write=95.028 s, sync=0.135 s, total=95.182 s
2010-01-23 01:16:22 MYTLOG:  checkpoint complete: wrote 11720 buffers 
(4.5%); 0 transaction log file(s) added, 0 removed, 19 recycled; 
write=41.854 s, sync=0.640 s, total=42.518 s
2010-01-23 01:19:11 MYTLOG:  checkpoint complete: wrote 10436 buffers 
(4.0%); 0 transaction log file(s) added, 0 removed, 20 recycled; 
write=50.330 s, sync=0.510 s, total=50.861 s

Your interval between checkpoints isn't too bad--3 minutes on the worst 
of these.  And the amount per checkpoint isn't that high either:  12786 
buffers=100MB.  Your problem is that the sync on that 100MB write is 
taking an awfully long time.  Spacing your checkpoints out further would 
help, sure, but I wouldn't be surprised to find that doesn't do much for 
you.  Worth a shot, unlikely to hurt anything, just not optimistic about 
it being the right think the way I was on the other message.

This is running on a 16 core UltrasparcIV server, solaris 10, zfs, 
tons of disks on SAN with multiple tablespaces on different ZFS's, 
with the block sizes tuned appropriately

My guess is that there's something wrong with your config such that 
writes followed by fsync are taking longer than they should.  When I see 
"sync=0.640 s" into a SAN where that sync operation should be near 
instant, I'd be looking for issues in the ZFS intent log setup, how much 
data you've setup ZFS to write cache, and the write cache policy on the 
SAN hardware.  There's something weird going on here, that sync should 
be near instant on your hardware with such a small write volume.

Unfortunately, you may have to take the server down to find out exactly 
what's going on, which doesn't sound practical for your environment.  
(And people wonder why I have near religious zeal about testing disk 
hardware before systems go into production)

--
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@xxxxxxxxxxxxxxx  www.2ndQuadrant.com

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general