On 14.8.2013 20:35, Tom Lane wrote: > "Tomas Vondra" <tv@xxxxxxxx> writes: >> I've run into a strange plan difference on 9.1.9 - the first query >> does "DISTINCT" by doing a GROUP BY on the columns (both INT). ... >> Now, this takes ~45 seconds to execute, but after rewriting the >> query to use the regular DISTINCT it suddenly switches to >> HashAggregate with ~1/3 the cost (although it produces the same >> output, AFAIK), and it executes in ~15 seconds. > > [ scratches head... ] I guess you're running into some corner case > where choose_hashed_grouping and choose_hashed_distinct make > different choices. It's going to be tough to debug without a test > case though. I couldn't reproduce the behavior in a few tries here. > >> BTW I can't test this on 9.2 or 9.3 easily, as this is our >> production environment and I can't just export the data. I've tried >> to simulate this but so far no luck. > > I suppose they won't yet you step through those two functions with a > debugger either ... OK, this time the complete message ... I've managed to get the data to a different machine, and I've spent some time on debugging it. It seems that the difference is in evaluating hashentrysize - while choose_hashed_distinct does this: /* * Don't do it if it doesn't look like the hashtable will fit into * work_mem. */ hashentrysize = MAXALIGN(path_width) + MAXALIGN(sizeof(MinimalTupleData)); if (hashentrysize * dNumDistinctRows > work_mem * 1024L) return false; while choose_hashed_grouping does this: /* Estimate per-hash-entry space at tuple width... */ hashentrysize = MAXALIGN(path_width) + MAXALIGN(sizeof(MinimalTupleData)); /* plus space for pass-by-ref transition values... */ hashentrysize += agg_costs->transitionSpace; /* plus the per-hash-entry overhead */ hashentrysize += hash_agg_entry_size(agg_costs->numAggs); if (hashentrysize * dNumGroups > work_mem * 1024L) return false; In both cases the common parameter values are dNumGroups = dNumDistinctRows = 20451018 work_mem = 819200 but the hashentrysize size is 24 (choose_hashed_distinct) or 56 (choose_hashed_grouping). This causes that while _distinct evaluates the condition as false, and _grouping as true (and thus returns false). Now, the difference between 24 and 56 is caused by hash_agg_entry_size. It's called with numAggs=0 but returns 32. I'm wondering if it should return 0 in such cases, i.e. something like this: Size hash_agg_entry_size(int numAggs) { Size entrysize; if (numAggs == 0) return 0; /* This must match build_hash_table */ entrysize = sizeof(AggHashEntryData) + (numAggs - 1) * sizeof(AggStatePerGroupData); entrysize = MAXALIGN(entrysize); /* Account for hashtable overhead (assuming fill factor = 1) */ entrysize += 3 * sizeof(void *); return entrysize; } I've tested that after this both queries use HashAggregate (which is the right choice), but I haven't done any extensive checking so maybe I'm missing something. Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance