Tomas Vondra <tv@xxxxxxxx> writes: > I've managed to get the data to a different machine, and I've spent some > time on debugging it. Great, thanks for looking into it! > It seems that the difference is in evaluating hashentrysize > [ choose_hashed_distinct omits hash_agg_entry_size() ] > but the hashentrysize size is 24 (choose_hashed_distinct) or 56 > (choose_hashed_grouping). This causes that while _distinct evaluates the > condition as false, and _grouping as true (and thus returns false). Hah. > Now, the difference between 24 and 56 is caused by hash_agg_entry_size. > It's called with numAggs=0 but returns 32. I'm wondering if it should > return 0 in such cases, i.e. something like this: No, I don't think so. I'm pretty sure the reason choose_hashed_distinct is like that is that I subconsciously assumed hash_agg_entry_size would produce zero for numAggs = 0; but in fact it does not and should not, because there's still some overhead for the per-group hash entry whether or not there's any aggregates. So the right fix is that choose_hashed_distinct should add hash_agg_entry_size(0) onto its hashentrysize estimate. A separate issue is that the use of numAggs-1 in hash_agg_entry_size's calculations seems a bit risky if numAggs can be zero - I'm not sure we can rely on compilers to get that right. I'm inclined to replace that with use of offsetof. Likewise in build_hash_table. > I've tested that after this both queries use HashAggregate (which is the > right choice), but I haven't done any extensive checking so maybe I'm > missing something. It might be the preferable choice in this example, but you're looking at an edge case. If you want the thing to be using a hash aggregate for this size of problem, you should increase work_mem. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance