Re: How to optimize select count(*)..group by?

Greg Stark <gsstark@xxxxxxx> · 28 Jul 2005 13:22:04 -0400

Bryan Field-Elliot <bryan_lists@xxxxxxxxxxx> writes:

> We have this simple query:
> 
> select status, count(*) from customer group by status;
> 
> There is already a btree index on status, but, the customer table is
> huge, and this query must be executed very frequently... an "explain" on
> this query shows that it is quite costly (and we notice it runs
> slowly)...
> 
> Can someone recommend the best technique to optimize this? We can create
> new indices, we can re-write this query.. But we'd rather not add new
> tables or columns if possible (not just to solve this problem).

Without creating a new table you have three options, none of which will be
very fast:

1) If the entire table fits in ram then you could try setting random_page_cost
   close enough to 1 to cause the index to be used.

2) If it doesn't but there are a reasonably small number of distinct values of
   status you would hope to see a sequential scan and a hash aggregate being
   used. I would expect this to be what you would see with the default
   configuration. If not you might have to increase work_mem (or sort_mem
   depending on your version I think).

3) If you have a large table with a large set of status values then Postgres
   may be compelled to resort the entire table. In which case you should
   experiment with work_mem/sort_mem to get the largest value you can without
   inducing swapping. You could also experiment with putting pgsql_tmp on
   separate spindles.

None of these will be fast enough for OLTP access like a web page. You would
be best off performing this query periodically and stuffing the results in a
table for quick access.

-- 
greg

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your
       message can get through to the mailing list cleanly