Re: Avoid sorting when doing an array_agg

Peter Geoghegan <pg@xxxxxxx> · Sun, 4 Dec 2016 16:57:18 -0800

On Sun, Dec 4, 2016 at 4:09 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
> Of course, we would also have to teach cost_sort or someplace near there
> that non-C sorting is much more expensive than C-collation sorting.  Not
> sure about exactly how to set that up without it being a kluge.

We've talked about that before, in the context of parallel query. At
the 2014 developer meeting, IIRC.

> A related problem is that if you have "GROUP BY x,y" and no particular
> ORDER BY requirement, you could sort by either x,y or y,x before the
> GroupAgg.  This would matter if, say, there was an index matching one
> but not the other.  Right now we're very stupid and only consider x,y,
> but if there were room to consider more than one set of target pathkeys
> it would be fairly simple to make that better.

That sounds valuable, especially because it seems natural to make the
leading group-on var the least selective within a GROUP BY; having a
matching index that you can thereby use might be less common than that
in practice, unless and until the partial sort patch is committed.

I will tend to write "GROUP BY country, province, city" -- never
"GROUP BY city, province, country". I speak a language that is written
left-to-right, but it would be the opposite way around in both
directions if I spoke a language written right-to-left, I bet. Same
difference. This might be a very prevalent habit. In general, a
tuplesort will be faster with a high cardinality leading attribute, so
this habit works against tuplesort. (Assuming a leading attribute of
pass-by-value type, or with abbreviated key support.)

-- 
Peter Geoghegan

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general