Re: Re: significant performance hit whenever autovacuum runs after upgrading from 9.0 -> 9.1

Scott Marlowe <scott.marlowe@xxxxxxxxx> · Tue, 22 May 2012 15:39:04 -0600

I'd say to do some monitoring of your machine when this is happening.
vmstat, iostat, iotop, htop, and so on.  Are you running out of
memory, a context switch / interrupt storm?  IO bound?  And so on.

On Tue, May 22, 2012 at 12:20 PM, Lonni J Friedman <netllama@xxxxxxxxx> wrote:
> No one has any ideas or suggestions, or even questions?  If someone
> needs more information, I'd be happy to provide it.
>
> This problem is absolutely killing me.
>
> On Mon, May 21, 2012 at 2:05 PM, Lonni J Friedman <netllama@xxxxxxxxx> wrote:
>> Greetings,
>> I have a 4 server postgresql-9.1.3 cluster (one master doing streaming
>> replication to 3 hot standby servers).  All of them are running
>> Fedora-16-x86_64.  Last Friday I upgraded the entire cluster from
>> Fedora-15 with postgresql-9.0.6 to Fedora-16 with postgresql-9.1.3.  I
>> made no changes to postgresql.conf following the upgrade.  I used
>> pg_upgrade on the master to upgrade it, followed by blowing away
>> $PGDATA on all the standbys and rsyncing them fresh from the master.
>> All of the servers have 128GB RAM, and at least 16 CPU cores.
>>
>> Everything appeared to be working fine until last night when the load
>> on the master suddenly took off, and hovered at around 30.00 ever
>> since.  Prior to the load spike, the load was hovering around 2.00
>> (which is actually lower than it was averaging prior to the upgrade
>> when it was often around 4.00).  When I got in this morning, I found
>> an autovacuum process that had been running since just before the load
>> spiked, and the pg_dump cronjob that started shortly after the load
>> spike (and normally completes in about 20 minutes for all the
>> databases) was still running, and hadn't finished the first of the 6
>> databases.  I ended up killing the pg_dump process altogether in the
>> hope that it might unblock whatever was causing the high load.
>> Unfortunately that didn't help, and the load continued to run high.
>>
>> I proceeded to check dmesg, /var/log/messages and the postgresql
>> server log (all on the master), but I didn't spot anything out of the
>> ordinary, definitely nothing that pointed to a potential explanation
>> for all of the high load.
>>
>> I inspected what the autovacuum process was doing, and determined that
>> it was chewing away on the largest table (nearly 98 million rows) in
>> the largest database.  It was making very slow progress, at least I
>> believe that was the case, as when I attached strace to the process,
>> the seek addresses were changing in a random fashion.
>>
>> Here are the current autovacuum settings:
>> autovacuum                      | on
>> autovacuum_analyze_scale_factor | 0.1
>> autovacuum_analyze_threshold    | 50
>> autovacuum_freeze_max_age       | 200000000
>> autovacuum_max_workers          | 4
>> autovacuum_naptime              | 1min
>> autovacuum_vacuum_cost_delay    | 20ms
>> autovacuum_vacuum_cost_limit    | -1
>> autovacuum_vacuum_scale_factor  | 0.2
>> autovacuum_vacuum_threshold     | 50
>>
>> Did something significant change in 9.1 that would impact autovacuum
>> behavior?  I'm at a complete loss on how to debug this, since I'm
>> using the exact same settings now as prior to the upgrade.
>>
>> thanks
>
> --
> Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

-- 
To understand recursion, one must first understand recursion.

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general