Re: Re: significant performance hit whenever autovacuum runs after upgrading from 9.0 -> 9.1

Lonni J Friedman <netllama@xxxxxxxxx> · Wed, 23 May 2012 10:09:49 -0700

On Wed, May 23, 2012 at 9:37 AM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
> Lonni J Friedman <netllama@xxxxxxxxx> writes:
>> After banging my head on the wall for  a long time, I happened to
>> notice that khugepaged was consuming 100% CPU every time autovacuum
>> was running.  I did:
>> echo "madvise" > /sys/kernel/mm/transparent_hugepage/defrag
>> and immediately the entire problem went away.
>
> Fascinating.

In hindsight, sure.  Before that, it was 2 days of horror.

>
>> So this looks like a nasty Fedora16 kernel bug to me, or maybe
>> postgresql & Fedora16's default kernel settings are just not
>> compatible?
>
> I agree, kernel bug.  What kernel version are you using exactly?

I'm using the stock 3.3.5-2.fc16.x86_64 kernel that is in Fedora updates.

>
>> Is anyone else using Fedora16 & PostgreSQL-9.1 ?
>
> I use an F16 box daily, but can't claim to have done major performance
> testing with it.  Can you put together a summary of your nondefault
> Postgres settings?  I wonder whether it only kicks in for a certain
> size of shared memory for instance.

Oh yea, I'm quite certain that this is somehow related to my setup,
and not a generic problem with all F16/pgsql systems.  For starters,
this problem isn't happening on any of the 3 standby systems, which
are all otherwise identical to the master in every respect.  Also when
we had done some testing (prior to the upgrades), we never ran into
any of these problems.  However our test environment was on smaller
scale hardware, with a much smaller number of clients (and overall
load).

Here are the non default settings in postgresql.conf :
wal_level = hot_standby
archive_mode = on
archive_timeout = 61
max_wal_senders = 10
wal_keep_segments = 5000
hot_standby = on
log_autovacuum_min_duration = 2500
autovacuum_max_workers = 4
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.7
effective_cache_size = 88GB
work_mem = 576MB
wal_buffers = 16MB
checkpoint_segments = 64
shared_buffers = 8GB
max_connections = 350

Let me know if you have any other questions.  I'd be happy to provide
as much information as possible if it can aid in fixing this bug.

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general