Re: Scalability in postgres

Greg Smith <gsmith@xxxxxxxxxxxxx> · Thu, 28 May 2009 21:54:52 -0400 (EDT)

On Thu, 28 May 2009, Flavio Henrique Araque Gurgel wrote:

It is 2.6.24 We had to apply the kswapd patch also. It's important 
specially if you see your system % going as high as 99% in top and 
loosing the machine's control. I have read something about 2.6.28 had 
this patch accepted in mainstream.

It would help if you gave more specific information about what you're 
talking about.  I know there was a bunch of back and forth on the "kswapd 
should only wait on IO if there is IO" patch, where it was commited and 
then reverted etc, but it's not clear to me if that's what you're talking 
about--and if so, what that has to do with the context switch problem.

Back to Fabrix's problem.  You're fighting a couple of losing battles 
here.  Let's go over the initial list:

1) You have 32 cores.  You think they should be allowed to schedule
3500 active connections across them.  That doesn't work, and what happens
is exactly the sort of context switch storm you're showing data for. 
Think about it for a minute:  how many of those can really be doing work 
at any time?  32, that's how many.  Now, you need some multiple of the 
number of cores to try to make sure everybody is always busy, but that 
multiple should be closer to 10X the number of cores rather than 100X. 
You need to adjust the connection pool ratio so that the PostgreSQL 
max_connections is closer to 500 than 5000, and this is by far the most 
critical thing for you to do.  The PostgreSQL connection handler is known 
to be bad at handling high connection loads compared to the popular 
pooling projects, so you really shouldn't throw this problem at it. 
While kernel problems stack on top of that, you really shouldn't start at 
kernel fixes; nail the really fundamental and obvious problem first.

2) You have very new hardware and a very old kernel.  Once you've done the 
above, if you're still not happy with performance, at that point you 
should consider using a newer one.  It's fairly simple to build a Linux 
kernel using the same basic kernel parameters as the stock RedHat one. 
2.6.28 is six months old now, is up to 2.6.28.10, and has gotten a lot 
more testing than most kernels due to it being the Ubuntu 9.04 default. 
I'd suggest you try out that version.

3) A system with 128GB of RAM is in a funny place where by using the 
defaults or the usual rules of thumb for a lot of parameters ("set 
shared_buffers to 1/4 of RAM") are all bad ideas.  shared_buffers seems to 
top out its usefulness around 10GB on current generation 
hardware/software, and some Linux memory tunables have defaults on 2.6.18 
that are insane for your system; vm_dirty_ratio at 40 comes to mind as the 
one I run into most.  Some of that gets fixed just by moving to a newer 
kernel, some doesn't.  Again, these aren't the problems you're having now 
though; they're the ones you'll have in the future *if* you fix the more 
fundamental problems first.

--
* Greg Smith gsmith@xxxxxxxxxxxxx http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance