On Wed, May 23, 2012 at 2:45 PM, Gavin Flower <GavinFlower@xxxxxxxxxxxxxxxxx> wrote: > On 24/05/12 08:18, Lonni J Friedman wrote: > > On Wed, May 23, 2012 at 12:36 PM, Gavin Flower > <GavinFlower@xxxxxxxxxxxxxxxxx> wrote: > > On 24/05/12 05:09, Lonni J Friedman wrote: > > On Wed, May 23, 2012 at 9:37 AM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote: > > Lonni J Friedman <netllama@xxxxxxxxx> writes: > > After banging my head on the wall for a long time, I happened to > notice that khugepaged was consuming 100% CPU every time autovacuum > was running. I did: > echo "madvise" > /sys/kernel/mm/transparent_hugepage/defrag > and immediately the entire problem went away. > > Fascinating. > > In hindsight, sure. Before that, it was 2 days of horror. > > So this looks like a nasty Fedora16 kernel bug to me, or maybe > postgresql & Fedora16's default kernel settings are just not > compatible? > > I agree, kernel bug. What kernel version are you using exactly? > > I'm using the stock 3.3.5-2.fc16.x86_64 kernel that is in Fedora updates. > > Is anyone else using Fedora16 & PostgreSQL-9.1 ? > > I use an F16 box daily, but can't claim to have done major performance > testing with it. Can you put together a summary of your nondefault > Postgres settings? I wonder whether it only kicks in for a certain > size of shared memory for instance. > > Oh yea, I'm quite certain that this is somehow related to my setup, > and not a generic problem with all F16/pgsql systems. For starters, > this problem isn't happening on any of the 3 standby systems, which > are all otherwise identical to the master in every respect. Also when > we had done some testing (prior to the upgrades), we never ran into > any of these problems. However our test environment was on smaller > scale hardware, with a much smaller number of clients (and overall > load). > > Here are the non default settings in postgresql.conf : > wal_level = hot_standby > archive_mode = on > archive_timeout = 61 > max_wal_senders = 10 > wal_keep_segments = 5000 > hot_standby = on > log_autovacuum_min_duration = 2500 > autovacuum_max_workers = 4 > maintenance_work_mem = 1GB > checkpoint_completion_target = 0.7 > effective_cache_size = 88GB > work_mem = 576MB > wal_buffers = 16MB > checkpoint_segments = 64 > shared_buffers = 8GB > max_connections = 350 > > Let me know if you have any other questions. I'd be happy to provide > as much information as possible if it can aid in fixing this bug. > > I think they will need details of things like: RAM, number/type processors, > number & type > of disks, disk controllers & any other hardware specs that might be relevant > etc.- at very > least: total RAM & number of spindles > > 16 core Xeon X5550 2.67GHz > 128GB RAM > $PGDATA sits on a RAID5 array comprised of 3 SATA disks. Its Linux's > md software RAID. > > How does this compare to your other machines running the same, or similar, > databases? > However, you do say that the other machines are indentical - but are the > other > machines different in any aspect, that might prove siginificant? > > > > Also anything else running on the box. > > nothing else. its dedicated exclusively to postgresql. > > Plus transaction load pattern - over time and read/write ratios. > > I'm not sure how I'd obtain this data. however, the patterns didn't > change since the upgrade. If someone can point me in the right > direction, I can at least obtain this data as its generated currently. > > type/nature of queries > > I need some clarification on specifically what you're asking for here. > > The complexity, structure, and features of the queries. Do you have lots of > sub queries, > and ORDER BY's? Also the number of tables accessed in a query. This is > heading into the > territory where others will be better placed to advise you as to what might > be relevant! No, not lots of subqueries or ORDERing, and most queries only touch a single table. However, I'm honestly not sure that I'm following where you're going with this. The problem isn't triggered by explicit queries. I can disable all external access, and simply wait for autovacuum to kick off, and the box starts to die. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general