On Thu, 28 May 2009, Flavio Henrique Araque Gurgel wrote:
It is 2.6.24 We had to apply the kswapd patch also. It's important
specially if you see your system % going as high as 99% in top and
loosing the machine's control. I have read something about 2.6.28 had
this patch accepted in mainstream.
It would help if you gave more specific information about what you're
talking about. I know there was a bunch of back and forth on the "kswapd
should only wait on IO if there is IO" patch, where it was commited and
then reverted etc, but it's not clear to me if that's what you're talking
about--and if so, what that has to do with the context switch problem.
Back to Fabrix's problem. You're fighting a couple of losing battles
here. Let's go over the initial list:
1) You have 32 cores. You think they should be allowed to schedule
3500 active connections across them. That doesn't work, and what happens
is exactly the sort of context switch storm you're showing data for.
Think about it for a minute: how many of those can really be doing work
at any time? 32, that's how many. Now, you need some multiple of the
number of cores to try to make sure everybody is always busy, but that
multiple should be closer to 10X the number of cores rather than 100X.
You need to adjust the connection pool ratio so that the PostgreSQL
max_connections is closer to 500 than 5000, and this is by far the most
critical thing for you to do. The PostgreSQL connection handler is known
to be bad at handling high connection loads compared to the popular
pooling projects, so you really shouldn't throw this problem at it.
While kernel problems stack on top of that, you really shouldn't start at
kernel fixes; nail the really fundamental and obvious problem first.
2) You have very new hardware and a very old kernel. Once you've done the
above, if you're still not happy with performance, at that point you
should consider using a newer one. It's fairly simple to build a Linux
kernel using the same basic kernel parameters as the stock RedHat one.
2.6.28 is six months old now, is up to 2.6.28.10, and has gotten a lot
more testing than most kernels due to it being the Ubuntu 9.04 default.
I'd suggest you try out that version.
3) A system with 128GB of RAM is in a funny place where by using the
defaults or the usual rules of thumb for a lot of parameters ("set
shared_buffers to 1/4 of RAM") are all bad ideas. shared_buffers seems to
top out its usefulness around 10GB on current generation
hardware/software, and some Linux memory tunables have defaults on 2.6.18
that are insane for your system; vm_dirty_ratio at 40 comes to mind as the
one I run into most. Some of that gets fixed just by moving to a newer
kernel, some doesn't. Again, these aren't the problems you're having now
though; they're the ones you'll have in the future *if* you fix the more
fundamental problems first.
--
* Greg Smith gsmith@xxxxxxxxxxxxx http://www.gregsmith.com Baltimore, MD
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance