On Wed, Feb 1, 2012 at 6:38 PM, Christopher Opena <counterveil@xxxxxxxxx> wrote: > Hello folks, > > We've been running into some very strange issues of late with our PostgreSQL > database(s). We have an issue where a couple of queries push high CPU on a > few of our processors and the entire database locks (reads, writes, console > cannot be achieved unless the high CPU query procs are killed). Further > investigation shows ~59% total cpu usage (we have 16 total cores), low io, > and mid-to-low memory usage (we have 74GB of memory, shared_buffers=16GB). > We had previously seen some high io problems but those turned out to be > unconnected and ultimately solved, yet we are still seeing a complete lock > of the DB occasionally as previously described. > > The queries themselves are not any different than normal usage on other > databases; they are pulling back a little more data but there's nothing that > stands out about them as far as query construction. > > One thing that we aren't sure of is whether or not we are running into a > general connection pooling issue. Our typical number of postgresql > processes fluctuates between 1,400 and 1,600 - most of which are idle - as > we have a number of application servers all connecting to a central > read/write master (the master replicates out to a secondary via streaming > replication). We have max_processes set to 3,000 after tweaking some kernel > memory parameters so at least we know we aren't exceeding that, but is there > a practical "real world" limit or issue with setting this too high? > > Ultimately, the problem we're seeing is a full read/write lock on a system > that is apparently at medium usage levels once we got rid of our high io red > herring. Honestly I'm a little stumped as to where to look next; is there > some specific metric I might be missing here? > > Any help is greatly appreciated, Random thoughts/suggestions: *) Look for some correlation between non-idle process count and locking situation. You are running a lot of processes and if I was in your shoes I would be strongly looking at pgbouncer to handle connection pooling. You could be binding in the database or (worse) the kernel *) Try logging checkpoints to see if there is any correlation with your locks. *) An strace of both the 'high cpu' process and one of the blocked process might give some clues -- in particular if you are being blocked on a system call *) Given enough time, do your high cpu queries ever complete? Are they writing or reading? *) What happens to overall system load if you lower shared_buffers to, say, 4gb? merlin -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general