Re: significant performance hit whenever autovacuum runs after upgrading from 9.0 -> 9.1

Lonni J Friedman <netllama@xxxxxxxxx> · Tue, 22 May 2012 11:20:54 -0700

No one has any ideas or suggestions, or even questions?  If someone
needs more information, I'd be happy to provide it.

This problem is absolutely killing me.

On Mon, May 21, 2012 at 2:05 PM, Lonni J Friedman <netllama@xxxxxxxxx> wrote:
> Greetings,
> I have a 4 server postgresql-9.1.3 cluster (one master doing streaming
> replication to 3 hot standby servers).  All of them are running
> Fedora-16-x86_64.  Last Friday I upgraded the entire cluster from
> Fedora-15 with postgresql-9.0.6 to Fedora-16 with postgresql-9.1.3.  I
> made no changes to postgresql.conf following the upgrade.  I used
> pg_upgrade on the master to upgrade it, followed by blowing away
> $PGDATA on all the standbys and rsyncing them fresh from the master.
> All of the servers have 128GB RAM, and at least 16 CPU cores.
>
> Everything appeared to be working fine until last night when the load
> on the master suddenly took off, and hovered at around 30.00 ever
> since.  Prior to the load spike, the load was hovering around 2.00
> (which is actually lower than it was averaging prior to the upgrade
> when it was often around 4.00).  When I got in this morning, I found
> an autovacuum process that had been running since just before the load
> spiked, and the pg_dump cronjob that started shortly after the load
> spike (and normally completes in about 20 minutes for all the
> databases) was still running, and hadn't finished the first of the 6
> databases.  I ended up killing the pg_dump process altogether in the
> hope that it might unblock whatever was causing the high load.
> Unfortunately that didn't help, and the load continued to run high.
>
> I proceeded to check dmesg, /var/log/messages and the postgresql
> server log (all on the master), but I didn't spot anything out of the
> ordinary, definitely nothing that pointed to a potential explanation
> for all of the high load.
>
> I inspected what the autovacuum process was doing, and determined that
> it was chewing away on the largest table (nearly 98 million rows) in
> the largest database.  It was making very slow progress, at least I
> believe that was the case, as when I attached strace to the process,
> the seek addresses were changing in a random fashion.
>
> Here are the current autovacuum settings:
> autovacuum                      | on
> autovacuum_analyze_scale_factor | 0.1
> autovacuum_analyze_threshold    | 50
> autovacuum_freeze_max_age       | 200000000
> autovacuum_max_workers          | 4
> autovacuum_naptime              | 1min
> autovacuum_vacuum_cost_delay    | 20ms
> autovacuum_vacuum_cost_limit    | -1
> autovacuum_vacuum_scale_factor  | 0.2
> autovacuum_vacuum_threshold     | 50
>
> Did something significant change in 9.1 that would impact autovacuum
> behavior?  I'm at a complete loss on how to debug this, since I'm
> using the exact same settings now as prior to the upgrade.
>
> thanks

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general