Postgres 8.2.4.
We have a large table, let's call it "foo", whereby an automated process periodically inserts many (hundreds of thousands or millions) rows into it at a time. It's essentially INSERT INTO foo SELECT FROM <another table> WHERE <some conditions>. Recently, for whatever reason, the query started to run out of memory. This happened on the order of 50 times before it was noticed and the process was stopped. (Admittedly, more investigation needs to go into the OOM problem... )
Now autovacuum promptly kicked in trying to clean up this mess, however it couldn't keep up at the rate that dead tuples were being generated. I'm not sure if it got into a weird state. After a few days, long after the inserting process was stopped, we decided to abort the vacuum (which we weren't convinced was doing anything), then start a manual vacuum with a higher vacuum_cost_limit to get things cleaned up quicker.
After 28 hours, here was the output of vacuum verbose:
# VACUUM VERBOSE foo;
INFO: vacuuming "public.foo"
INFO: scanned index "foo_pkey" to remove 44739062 row versions
DETAIL: CPU 5.74s/26.09u sec elapsed 529.57 sec.
INFO: scanned index "foo_1" to remove 44739062 row versions
DETAIL: CPU 760.09s/619.83u sec elapsed 56929.54 sec.
INFO: scanned index "foo_2" to remove 44739062 row versions
DETAIL: CPU 49.35s/99.57u sec elapsed 4410.74 sec.
INFO: "foo": removed 44739062 row versions in 508399 pages
DETAIL: CPU 47.35s/12.88u sec elapsed 3985.92 sec.
INFO: scanned index "foo_pkey" to remove 32534234 row versions
DETAIL: CPU 22.05s/32.51u sec elapsed 2259.05 sec.
INFO: vacuuming "public.foo"
INFO: scanned index "foo_pkey" to remove 44739062 row versions
DETAIL: CPU 5.74s/26.09u sec elapsed 529.57 sec.
INFO: scanned index "foo_1" to remove 44739062 row versions
DETAIL: CPU 760.09s/619.83u sec elapsed 56929.54 sec.
INFO: scanned index "foo_2" to remove 44739062 row versions
DETAIL: CPU 49.35s/99.57u sec elapsed 4410.74 sec.
INFO: "foo": removed 44739062 row versions in 508399 pages
DETAIL: CPU 47.35s/12.88u sec elapsed 3985.92 sec.
INFO: scanned index "foo_pkey" to remove 32534234 row versions
DETAIL: CPU 22.05s/32.51u sec elapsed 2259.05 sec.
The vacuum then just sat there. What I can't understand is why it went back for a second pass of the pkey index? There was nothing writing to the table once the vacuum began. Is this behaviour expected? Are these times reasonable for a vacuum (on a busy system, mind you)?
We have since aborted the vacuum and truncated the table. We're now working on the root OOM problem, which is easier said than done...
Steve