Re: Huge Data sets, simple queries

Tom Lane <tgl@xxxxxxxxxxxxx> · Sat, 28 Jan 2006 13:55:08 -0500

I wrote:
> (We might need to tweak the planner to discourage selecting
> HashAggregate in the presence of DISTINCT aggregates --- I don't
> remember whether it accounts for the sortmem usage in deciding
> whether the hash will fit in memory or not ...)

Ah, I take that all back after checking the code: we don't use
HashAggregate at all when there are DISTINCT aggregates, precisely
because of this memory-blow-out problem.

For both your group-by-date query and the original group-by-month query,
the plan of attack is going to be to read the original input in grouping
order (either via sort or indexscan, with sorting probably preferred
unless the table is pretty well correlated with the index) and then
sort/uniq on the DISTINCT value within each group.  The OP is probably
losing on that step compared to your test because it's over much larger
groups than yours, forcing some spill to disk.  And most likely he's not
got an index on month, so the first sort is in fact a sort and not an
indexscan.

Bottom line is that he's probably doing a ton of on-disk sorting
where you're not doing any.  This makes me think Luke's theory about
inadequate disk horsepower may be on the money.

			regards, tom lane