Thanks for your reply. On Tue, May 22, 2012 at 7:19 PM, Andy Colson <andy@xxxxxxxxxxxxxxx> wrote: > On Mon, May 21, 2012 at 2:05 PM, Lonni J Friedman<netllama@xxxxxxxxx> > wrote: >>> >>> Greetings, >>> >>> When I got in this morning, I found >>> an autovacuum process that had been running since just before the load >>> spiked, > > > Autovacuum might need to set the freeze bit very first time it runs. I > recall hearing advice about running a 'vacuum freeze' after you insert a > huge amount of data. And I recall pg_upgrade doesn't write stats, so did > you analyze your database? yes, I ran a 'vacuum analyze' for all databases & tables immediately following completion of pg_upgrade. > > Or, maybe its not vacuum... maybe some of your sql statements are planning > differently and running really bad. Can you check some? Can you log slow > queries? > > Have you checked the status of your raid? Maybe you lost a drive and its in > recovery and you have very slow IO? I checked that initially, but the array is fine. After banging my head on the wall for a long time, I happened to notice that khugepaged was consuming 100% CPU every time autovacuum was running. I did: echo "madvise" > /sys/kernel/mm/transparent_hugepage/defrag and immediately the entire problem went away. Load dropped within minutes from 35.00 to 1.00, and has remained under 4.00 for the past 18 hours. Prior to disabling defrag, I never saw the load below 10.00 for more than a few seconds at a time. So this looks like a nasty Fedora16 kernel bug to me, or maybe postgresql & Fedora16's default kernel settings are just not compatible? Is anyone else using Fedora16 & PostgreSQL-9.1 ? -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general