Hey folks, I've got Munin installed on all my systems, so was able to get some interesting data around the big crash we had last night. We'd thought it was simply a matter of our DB connections maxing out, but it looks a bit more complex than that. A good 2 or 3 hours before the connections max (at 300), about 80% of them go to 'waiting'. I'm wondering what other data I should be looking at to see why. Munin also graphs iostat, but unfortunately only the amount of activity. I look at the iostat man page and it looks to me like this might be of interest, no? avgqu-sz The average queue length of the requests that were issued to the device. await The average time (in milliseconds) for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them. Here is the core of the Munin plugin for anyone who wants to know - written in Perl : my $sql = "select count(*), waiting from pg_stat_activity "; $sql .= " where datname = ? group by waiting "; print "# $sql\n" if $debug; my $sth = $dbh->prepare($sql); $sth->execute($dbname); printf ("# Rows: %d\n", $sth->rows) if $debug; if ($sth->rows > 0) { my $c_waiting = 0; my $c_active = 0; while (my ($count, $waiting) = $sth->fetchrow_array()) { if ($waiting) { $c_waiting = $count; } else { $c_active = $count; } } print "waiting.value $c_waiting\n"; print "active.value $c_active\n"; } $sql = "select count(*) from pg_stat_activity "; $sql .= " where datname = ? and current_query like '<IDLE>%'"; print "# $sql\n" if $debug; $sth = $dbh->prepare($sql); $sth->execute($dbname); printf ("# Rows: %d\n", $sth->rows) if $debug; if ($sth->rows > 0) { my ($idle) = $sth->fetchrow_array(); print "idle.value $idle\n"; } } -- “Don't eat anything you've ever seen advertised on TV” - Michael Pollan, author of "In Defense of Food" -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general