Hello all!
Running a vac on an 8.2 client connecting to an 8.2 database (following
example was for a 14 row static table) - vacuums will sit (for lack of a
better word) for anywhere from ten to twenty minutes before taking a
lock out on the table and doing work there. Once the vacuum does
complete, I noticed that no timestamp is registered in
pg_stat_all_tables for that relation for the last-vacuum'd timestamp
(however analyze does seem to set it's timestamp). I asked it to run a
vacuum on an index (knowing it would fail out), and again, the vacuum
sat for several minutes before finally erroring out saying that it
couldn't vacuum an index. Out of curiosity I tried the vacuum on an 8.1
client connected to the 8.2 db, same delay.
In running a truss on the process while it is running, there is over
five minutes where the process seems to be scanning pg_class (at least
thats the only table listed in pg_locks for this process). Following
this it drops into a cycle of doing the same send() command with several
seconds lag between each one, and every so often it catches the same
interrupt (SIGUSR1) and then goes back into the same cycle of send()
calls. Also, whatever it is doing during this stage, it isn't checking
for process-cancelled interrupts, as the process won't recognize it's
been requested to cancel until it breaks out of this cycle of send()s
and SIGUSR1s (which can go for another several minutes). I'm happy to
send along the gore of the truss call if you think it would be helpful...
Any ideas what the vac is prepping for that it could become bogged down
in before finally taking the lock on the table?
Is the lack of a timestamp set for last_vacuum in pg_stat_all_tables an
indication that there may be something incomplete about our install?
Since the upgrade, we've also seen unusual lag time in simple inserts
into tables (atomic inserts have been seen running for several seconds),
and also extreme delays in running \d on tables (I got tired of counting
past 2 minutes, connecting with an 8.1 client gives immediate response
on this command). We plan to upgrade to 8.2.1 as soon as possible, and
also to drop into single user mode and run a reindex system, but any
suggestions in the meantime as to a potential cause or a way to further
debug the vacs would be greatly appreciated.
OS: Solaris 10
write transactions/hr: 1.5 million
size of pg_class: 535,226
number of relations: 108,694
Thanks to all,
Kim