Why does autovacuum clean fewer rows than I expect?

Nick Cabatoff <nick.cabatoff@xxxxxxxxx> · Thu, 3 Jul 2014 17:07:37 -0400

Hi,

I'm having trouble making sense of the these two autovacuum log entries.  
I'm running PostgreSQL 8.4.

[2014-05-22 04:56:43.486 EST]  {537cf2c6.30f9}  LOG:  automatic vacuum of table "postgres.globaldicom.requests": index scans: 1
        pages: 0 removed, 163600 remain
        tuples: 5916 removed, 8423598 remain
        system usage: CPU 3.62s/9.84u sec elapsed 1060.49 sec

[2014-05-24 05:02:12.707 EST]  {537f90c3.7144}  LOG:  automatic vacuum of table "postgres.globaldicom.requests": index scans: 2
        pages: 0 removed, 510725 remain
        tuples: 1554 removed, 8583343 remain
        system usage: CPU 9.19s/16.86u sec elapsed 2673.32 sec

I've verified that no vacuum (manual or auto) took place between them unless
they were basically no-ops, since log_autovacuum_min_duration=250.  Otherwise
we're using the stock autovacuum config settings, thus 3 workers, 1min naptime,
autovacuum_vacuum_scale_factor=0.2, vacuum_cost_delay=0ms, etc.

First question: how do we get a 3x page usage increase with a <1% tuple
increase?  The only thing I can think of is that a lot of very big values were
stored.  To my knowledge that didn't happen, but I can't rule it out entirely.
Since I judge it quite unlikely I'm looking for alternate explanations of what
might have happened.

Second question: What triggered the above autovacuums?  From what I understand
it must be one of imminent txid wraparound or exceeding
autovacuum_vacuum_scale_factor.  If it was the latter, suggesting that 20% of
the 8.5M tuples had changed, then why were only a few thousand tuples removed?
I'm not sure how to eliminate the possibility that it was the former after the
fact.  However, I know that there was a lot of activity, going by the
auto-analyzes that took place (autovacuum_vacuum_analyze_factor=0.1):

[2014-05-20 11:08:59.632 EST]  {537aab0a.b704}  LOG:  automatic analyze of table "postgres.globaldicom.requests" system usage: CPU 0.36s/0.45u sec elapsed 32.98 sec
[2014-05-23 21:12:49.164 EST]  {537f2d12.459b}  LOG:  automatic analyze of table "postgres.globaldicom.requests" system usage: CPU 0.08s/0.36u sec elapsed 30.60 sec
[2014-05-24 00:56:00.927 EST]  {537f6161.2b77}  LOG:  automatic analyze of table "postgres.globaldicom.requests" system usage: CPU 0.16s/0.38u sec elapsed 31.42 sec
[2014-05-24 06:33:16.972 EST]  {537fb06d.e9e3}  LOG:  automatic analyze of table "postgres.globaldicom.requests" system usage: CPU 0.15s/0.35u sec elapsed 31.13 sec

This isn't an isolated incident, I see many autovacuum log entries where only a
few tuples are removed from a very large table, and I can't figure out why it
was being vacuumed if that's all there was to be removed.