Re: Potential performance issues related to group by and covering index

Hannu Krosing <hannuk@xxxxxxxxxx> · Fri, 5 Mar 2021 02:14:40 +0100

In the original example it looks like using the index (and not running
a parallel query) is what made the query slow

The fast version was brute-force sequscan(s) + sort with 3 parallel
backends (leader + 2 workers) sharing the work.

On Tue, Mar 2, 2021 at 10:42 PM David Rowley <dgrowleyml@xxxxxxxxx> wrote:
>
> On Wed, 3 Mar 2021 at 10:04, Michael Lewis <mlewis@xxxxxxxxxxx> wrote:
> > Are there guidelines or principles you could share about writing the group by clause such that it is more efficient?
>
> If you have the option of writing them in the same order as an
> existing btree index that covers the entire GROUP BY clause (in
> version < PG13) or at least prefix of the GROUP BY clause (version >=
> PG13), then the planner has a chance to make use of that index to
> provide pre-sorted input to do group aggregate.
>
> Since PG13 has Incremental Sort, having an index that covers only a
> prefix of the GROUP BY clause may still help.
>
> If no indexes exist then you might get better performance by putting
> the most distinct column first.  That's because sorts don't need to
> compare the remaining columns once it receives two different values
> for one column.  That gets more complex when the most distinct column
> is wider than the others. e.g a text compare is more expensive than
> comparing two ints.  For Hash Aggregate, I don't think the order will
> matter much.
>
> David
>
>