Thomas Pöhler<tp@xxxxxxxxxxxxxxxxxxxxxxx> wrote: > we are using two instances of pgbouncer v1.4 for connection > pooling. One for prepared statements (pool_mode session) and one > without (pool_mode transaction). > max_client_conn = 10000 > default_pool_size = 450 Your best defense against the "thundering herd" issues you describe would be to eliminate the session pool (if you can), and drop the default_pool_size for the transaction pool to where at peak the number of backends actually busy is about twice your number of *actual* cores. (Don't count hyperthreading "logical" cores for this purpose.) max_client_conn can be as high as you need; the point is for the connection pool to funnel the requests through a much smaller pool of database connections. > If I remember correctly vmstat showed lots of context switches > during a peak above 50k. Yeah, that's part of the reason throughput tanks when your active connection count gets too high. > We are running a biweekly downtime where we do a complete reindex > and vacuum full. We cannot identify certain queries causing this. If you really get bloat which requires VACUUM FULL, tracking down the reason should be a high priority. You normally shouldn't need to run that. Also, I hope when you run that it is VACUUM FULL followed by REINDEX, not the other way around. In fact, it would probably be faster to CLUSTER (if you have room) or drop the indexes, VACUUM FULL, and then create the indexes again. > The last graph in ganglia > (http://dl.dropbox.com/u/183323/CPUloadprobsdb1.jpg) shows the > avg_queries from pgbouncers stats. I think this is a symptom of > many waiting queries which accumulate. While it seems counter-intuitive, you're likely to have fewer queries waiting a long time there if you reduce default_pool_size so that contention doesn't kill performance when the queries *do* get to run. > max_connections 1000 This is what you need to try to reduce. > max_prepared_transactions 5 If you're actually using prepared transactions, make sure none are lingering about for a long time during these incidents. Well, *ever*, really -- but I would definitely check during problem periods. > wal_buffers 1MB You should bump this to 16MB. > The database is fitting completely into ram Then you should probably be adjusting sequential_page_cost and rand_page_cost. You'll probably get plans which run faster, which should help overall. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance