Re: Potential performance issues related to group by and covering index

David Rowley <dgrowleyml@xxxxxxxxx> · Wed, 3 Mar 2021 10:41:24 +1300

On Wed, 3 Mar 2021 at 10:04, Michael Lewis <mlewis@xxxxxxxxxxx> wrote:
> Are there guidelines or principles you could share about writing the group by clause such that it is more efficient?

If you have the option of writing them in the same order as an
existing btree index that covers the entire GROUP BY clause (in
version < PG13) or at least prefix of the GROUP BY clause (version >=
PG13), then the planner has a chance to make use of that index to
provide pre-sorted input to do group aggregate.

Since PG13 has Incremental Sort, having an index that covers only a
prefix of the GROUP BY clause may still help.

If no indexes exist then you might get better performance by putting
the most distinct column first.  That's because sorts don't need to
compare the remaining columns once it receives two different values
for one column.  That gets more complex when the most distinct column
is wider than the others. e.g a text compare is more expensive than
comparing two ints.  For Hash Aggregate, I don't think the order will
matter much.

David