On 11/20/2012 04:08 PM, Jeff Janes wrote:
Shaun Thomas reports one that is (I assume) not read intensive, but his diagnosis is that this is a kernel bug where a larger shared_buffers for no good reason causes the kernel to kill off its page cache.
We're actually very read intensive. According to pg_stat_statements, we regularly top out at 42k queries per second, and pg_stat_database says we're pushing 7k TPS.
But I'm still sure this is a kernel bug. Moving from 4GB to 6GB or 8GB causes the kernel to cut the active page cache in half, in addition to freeing 1/4 of RAM to just sit around doing nothing. That in turn causes kswapd to work constantly, while our IO drivers work to undo the damage. It's a positive feedback loop that I can reliably drive the load up to 800+ on an 800-client pgbench with two threads, all while having 0% CPU free.
Make that 4GB, and not only does the problem completely disappear, but the load settles down to around 9, and the machine becomes about 60% idle. Something in there is fantastically broken, but I can't point a finger at what.
I was just piping in because, in absence of an obvious PG-related culprit, the problem could be the OS itself. It certainly was in our case.
That, or PG has a memory leak that only appears at > 4GB of shared buffers. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@xxxxxxxxxxxxxxxx ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general