Re: high user cpu, massive SELECTs, no io waiting problem

"Kevin Grittner" <Kevin.Grittner@xxxxxxxxxxxx> · Wed, 16 Feb 2011 13:22:26 -0600

Thomas Pöhler<tp@xxxxxxxxxxxxxxxxxxxxxxx> wrote:

> we are using two instances of pgbouncer v1.4 for connection
> pooling.  One for prepared statements (pool_mode session) and one
> without (pool_mode transaction).

> max_client_conn = 10000
> default_pool_size = 450

Your best defense against the "thundering herd" issues you describe
would be to eliminate the session pool (if you can), and drop the
default_pool_size for the transaction pool to where at peak the
number of backends actually busy is about twice your number of
*actual* cores.  (Don't count hyperthreading "logical" cores for
this purpose.)  max_client_conn can be as high as you need; the
point is for the connection pool to funnel the requests through a
much smaller pool of database connections.

> If I remember correctly vmstat showed lots of context switches
> during a peak above 50k.

Yeah, that's part of the reason throughput tanks when your active
connection count gets too high.

> We are running a biweekly downtime where we do a complete reindex
> and vacuum full. We cannot identify certain queries causing this.

If you really get bloat which requires VACUUM FULL, tracking down
the reason should be a high priority.  You normally shouldn't need
to run that.

Also, I hope when you run that it is VACUUM FULL followed by
REINDEX, not the other way around.  In fact, it would probably be
faster to CLUSTER (if you have room) or drop the indexes, VACUUM
FULL, and then create the indexes again.

> The last graph in ganglia 
> (http://dl.dropbox.com/u/183323/CPUloadprobsdb1.jpg) shows the
> avg_queries from pgbouncers stats. I think this is a symptom of
> many waiting queries which accumulate.

While it seems counter-intuitive, you're likely to have fewer
queries waiting a long time there if you reduce
default_pool_size so that contention doesn't kill performance when
the queries *do* get to run.

> max_connections	1000

This is what you need to try to reduce.

> max_prepared_transactions	5

If you're actually using prepared transactions, make sure none are
lingering about for a long time during these incidents.  Well,
*ever*, really -- but I would definitely check during problem
periods.

> wal_buffers	1MB

You should bump this to 16MB.

> The database is fitting completely into ram

Then you should probably be adjusting sequential_page_cost and
rand_page_cost.  You'll probably get plans which run faster, which
should help overall.

-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance