Greg Smith wrote:
2010-01-22 12:21:48 JSTLOG: checkpoint complete: wrote 83874 buffers
(16.0%); 0 transaction log file(s) added, 0 removed, 32 recycled;
write=138.040 s, sync=0.000 s, total=138.063 s
2010-01-22 12:23:32 JSTLOG: checkpoint complete: wrote 82856 buffers
(15.8%); 0 transaction log file(s) added, 0 removed, 32 recycled;
write=18.740 s, sync=0.000 s, total=18.783 s
2010-01-22 12:24:26 JSTLOG: checkpoint complete: wrote 75145 buffers
(14.3%); 0 transaction log file(s) added, 0 removed, 32 recycled;
write=12.129 s, sync=0.000 s, total=12.132 s
2010-01-22 12:25:30 JSTLOG: checkpoint complete: wrote 82108 buffers
(15.7%); 0 transaction log file(s) added, 0 removed, 32 recycled;
write=10.619 s, sync=0.000 s, total=10.621 s
2010-01-22 12:28:03 JSTLOG: checkpoint complete: wrote 87349 buffers
(16.7%); 0 transaction log file(s) added, 0 removed, 32 recycled;
write=82.190 s, sync=0.000 s, total=82.192 s
Here you're getting a checkpoint every minute or three, and each of
them is writing out ~80000 buffers = 625MB. That is crazy, and no
wonder your queries are slow--the system is spending all of its time
doing constant, extremely expensive checkpoints.
.....
we're having a similar problem with a very update intensive database
that is part of a 24/7 manufacturing operation (no breathing time unless
there's an unusual line down situtation)
a snip of the log...
2010-01-23 01:02:19 MYTLOG: checkpoint complete: wrote 764 buffers
(0.3%); 0 transaction log file(s) added, 0 removed, 20 recycled;
write=31.207 s, sync=0.147 s, total=31.394 s
2010-01-23 01:08:13 MYTLOG: checkpoint complete: wrote 7535 buffers
(2.9%); 0 transaction log file(s) added, 0 removed, 20 recycled;
write=86.510 s, sync=0.260 s, total=86.791 s
2010-01-23 01:13:22 MYTLOG: checkpoint complete: wrote 12786 buffers
(4.9%); 0 transaction log file(s) added, 0 removed, 17 recycled;
write=95.028 s, sync=0.135 s, total=95.182 s
2010-01-23 01:16:22 MYTLOG: checkpoint complete: wrote 11720 buffers
(4.5%); 0 transaction log file(s) added, 0 removed, 19 recycled;
write=41.854 s, sync=0.640 s, total=42.518 s
2010-01-23 01:19:11 MYTLOG: checkpoint complete: wrote 10436 buffers
(4.0%); 0 transaction log file(s) added, 0 removed, 20 recycled;
write=50.330 s, sync=0.510 s, total=50.861 s
2010-01-23 01:23:19 MYTLOG: checkpoint complete: wrote 6446 buffers
(2.5%); 0 transaction log file(s) added, 0 removed, 20 recycled;
write=57.195 s, sync=0.583 s, total=57.790 s
2010-01-23 01:27:54 MYTLOG: checkpoint complete: wrote 4454 buffers
(1.7%); 0 transaction log file(s) added, 0 removed, 20 recycled;
write=44.805 s, sync=0.118 s, total=44.943 s
2010-01-23 01:32:36 MYTLOG: checkpoint complete: wrote 4645 buffers
(1.8%); 0 transaction log file(s) added, 0 removed, 20 recycled;
write=50.792 s, sync=0.689 s, total=51.502 s
2010-01-23 01:37:13 MYTLOG: checkpoint complete: wrote 5722 buffers
(2.2%); 0 transaction log file(s) added, 0 removed, 20 recycled;
write=58.077 s, sync=0.606 s, total=58.707 s
2010-01-23 01:41:20 MYTLOG: checkpoint complete: wrote 5006 buffers
(1.9%); 0 transaction log file(s) added, 0 removed, 20 recycled;
write=36.064 s, sync=0.898 s, total=36.986 s
2010-01-23 01:46:14 MYTLOG: checkpoint complete: wrote 3644 buffers
(1.4%); 0 transaction log file(s) added, 0 removed, 20 recycled;
write=54.873 s, sync=0.659 s, total=55.554 s
2010-01-23 01:51:15 MYTLOG: checkpoint complete: wrote 5534 buffers
(2.1%); 0 transaction log file(s) added, 0 removed, 20 recycled;
write=63.152 s, sync=0.374 s, total=63.546 s
The admin tells me the current parameters are...
checkpoint_segments = 20 # in logfile segments, min 1,
16MB eac
checkpoint_timeout = 5min # range 30s-1h
and I'm suggesting 60 and 15min, does that seem reasonable?
This is running on a 16 core UltrasparcIV server, solaris 10, zfs, tons
of disks on SAN with multiple tablespaces on different ZFS's, with the
block sizes tuned appropriately
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general