On Thu, May 28, 2009 at 4:53 PM, Fabrix <fabrixio1@xxxxxxxxx> wrote:
Wow, that's some serious context-switching right there - 300k context
switches a second mean that the processors are spending a lot of their
time fighting for CPU time instead of doing any real work.
There is a bug in the quad core chips during a massive amount of connections that will cause all cores to go to 100% utilization and no work be done. I'm digging to find links, but if I remember correctly, the only way to fix it was to disable the 4th core in linux (involved some black magic in /proc). You really need to lower the number of processes you're forcing each processor bus to switch through (or switch to AMD's hyper-transport bus).
It appears that you have the server configured with a very high number
of connections as well? My first suggestion would be to look at a way
to limit the number of active connections to the server at a time
(pgPool or similar).
yes, i have max_connections = 5000
can lower, but at least i need 3500 connections
Typically, it's a bad idea to run run with anything over 1000 connections (many will suggest lower than that). If you need that many connections, you'll want to look at a connection pool like pgBouncer or pgPool.
--Scott