In the original example it looks like using the index (and not running a parallel query) is what made the query slow The fast version was brute-force sequscan(s) + sort with 3 parallel backends (leader + 2 workers) sharing the work. On Tue, Mar 2, 2021 at 10:42 PM David Rowley <dgrowleyml@xxxxxxxxx> wrote: > > On Wed, 3 Mar 2021 at 10:04, Michael Lewis <mlewis@xxxxxxxxxxx> wrote: > > Are there guidelines or principles you could share about writing the group by clause such that it is more efficient? > > If you have the option of writing them in the same order as an > existing btree index that covers the entire GROUP BY clause (in > version < PG13) or at least prefix of the GROUP BY clause (version >= > PG13), then the planner has a chance to make use of that index to > provide pre-sorted input to do group aggregate. > > Since PG13 has Incremental Sort, having an index that covers only a > prefix of the GROUP BY clause may still help. > > If no indexes exist then you might get better performance by putting > the most distinct column first. That's because sorts don't need to > compare the remaining columns once it receives two different values > for one column. That gets more complex when the most distinct column > is wider than the others. e.g a text compare is more expensive than > comparing two ints. For Hash Aggregate, I don't think the order will > matter much. > > David > >