For anyone that's still following - we tried upgrading to postgres 9.3.3 - that hasn't helped. Running an strace on the pid that was consuming the highest CPU at the time of the outage shows: semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91881569, {{12, 1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(89325587, {{14, 1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(90964037, {{4, 1, 0}}, 1) = 0 semop(90308657, {{5, 1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(88866821, {{12, 1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(90439733, {{13, 1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(90341426, {{2, 1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(90308657, {{5, 1, 0}}, 1) = 0 semop(91881569, {{12, 1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(88866821, {{12, 1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91881569, {{12, 1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(90865730, {{5, 1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(91521110, {{12, -1, 0}}, 1) = 0 semop(90865730, {{5, 1, 0}}, 1) = 0 I've seen other people talk of this problem with a lot of semop calls, haven't yet found a clear solution. Anyone have any ideas? I've also downloaded the perf tool based on http://rhaas.blogspot.com/2012/06/perf-good-bad-ugly.html - will see what that has to show. Thanks, Karthik On 3/11/14 1:06 PM, "John R Pierce" <pierce@xxxxxxxxxxxx> wrote: >On 3/11/2014 10:20 AM, Anand Kumar, Karthik wrote: >> We typically see about 500-700 active queries at a time > >if these are primarily small/fast queries, like OLTP operations, and you >DONT have 200-400 CPU cores on this server, you will likely find that if >you use a queueing mechanism to only execute about 2X your CPU core >count concurrently, you will get MORE total transactions/second than >trying to do 500-700 at once. > >if your apps are using persistent connections, then the session pooling >model won't do any good, you should use transaction pooling. you want >the actual active query count to be tunable, probably down around 2X the >cpu core count, depending on various things. some folks say, CPU >cores/threads plus disk spindles is the optimal number. > > > >-- >john r pierce 37N 122W >somewhere on the middle of the left coast > > > >-- >Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) >To make changes to your subscription: >http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general