procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
51 0 0 123717 152 121785 0 0 4 34 1 1 20 3 77 0 0
228 0 0 123702 152 121786 0 0 0 720 134874 295787 79 11 10 0 0
232 0 0 123700 152 121786 0 0 0 7128 138700 314263 79 12 9 0 0
274 0 0 123691 152 121786 0 0 0 980 133590 245257 81 11 8 0 0
380 0 0 123789 152 121786 0 0 0 1008 133792 258339 80 11 9 0 0
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
156 10 0 232743 45 16749 0 0 42 2 80 80 3 0 96 1 0
260 11 0 232639 45 16828 0 0 76704 2792 155594 223928 83 12 4 0 0
128 12 0 232552 45 16887 0 0 59536 2896 153932 229723 82 12 6 1 0
267 6 0 232480 45 16931 0 0 43144 2320 149501 200510 83 11 5 0 0
105 8 0 232408 45 16965 0 0 38016 2208 148905 185142 84 11 5 0 0
112 1 0 232339 45 17000 0 0 32640 2528 148390 183620 83 11 6 0 0
Again, if you want to see more of the data we've collected, see our post http://www.postgresql.org/message-id/CAJ+wzrb1qhz3xuoeSy5mo8i=E-5OO9Yvm6R+VxLBGaPB=uevqA@xxxxxxxxxxxxxx. We have shared collected information from the incident.
I'll also keep you updated with any new things we find and also how lowering the potential connections to our Postgres goes. We don't have plans of testing on CentOS 5.8.
On Thu, Jun 19, 2014 at 2:35 PM, Kevin Grittner <kgrittn@xxxxxxxxx> wrote:Completely agree on both diagnosis and proposed solution -- load
> "Vasudevan, Ramya" <ramya.vasudevan@xxxxxxxxxxxxxx> wrote:
>
>> On the waiting queries - When we reached 1500 connections, we
>> had 759 connections that were in active state (116 COMMIT, 238
>> INSERT, UPDATE 176, 57 AUTHENTICATION, 133 BIND. These active
>> INSERTS and UPDATES also includes the 80-90 waiting sessions (We
>> checked pg_stat_activity for 'waiting' state. And pg_locks for
>> granted=f). The blocking and the waiting queries were simple one
>> row updates/inserts/deletes. These shouldn’t be blocking each
>> other normally (unless, we think, there was a problem writing to
>> the disk). Correct me if I am wrong.
>
> You may want to consider this:
>
> http://wiki.postgresql.org/wiki/Number_Of_Database_Connections
>
>> [vmstat show up to 173111 context switches per second, with high
>> cs rate corresponding to user CPU% between 64 and 82]
>
> That usually means there is high contention for spinlocks,
> potentially with processes getting suspended while holding
> spinlocks, making things worse.
profile (low iowait, high user%, high processes, high CS) is
symptomatic of too many processes trying to do things at once. So
there may be some application caused driver of the problem or you are
hitting a contention point within postgres (a 'perf top' might give
clues to the latter). Either way, once you are in this state you end
up with too many processes fighting for cpu and cache lines which
exaggerates the problem to the point you can classify it as an outage.
Be advised transaction mode pooling makes certain features of the
database difficult or impossible to use -- advisory locks (except xact
variants), server side prepared statements, asynchronous
notificiation, WITH HOLD cursors and the like -- basically anything
scoped to the session. For many workloads it is a high win though.
If for whatever reason this solution doesn't work, your other options
are to try to optimize whatever is causing the load event
(particularly if it's in your code -- careful query logging might give
some clues) or to simply upgrade hardware (more/faster cpu especially
for your case) to the point that even when highly loaded you're always
clearing queries at an acceptable rate. The hardware approach has
some risk though -- if you have a contention problem it's not always a
given that adding cores will scale as well as you think. Faster
core/bus is almost always a win, but obviously there's a very strict
limit you can go.
merlin
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general