Re: queries with DISTINCT / GROUP BY giving different plans

Tomas Vondra <tv@xxxxxxxx> · Fri, 16 Aug 2013 21:36:19 +0200

On 14.8.2013 20:35, Tom Lane wrote:
> "Tomas Vondra" <tv@xxxxxxxx> writes:
>> I've run into a strange plan difference on 9.1.9 - the first query
>> does "DISTINCT" by doing a GROUP BY on the columns (both INT). ... 
>> Now, this takes ~45 seconds to execute, but after rewriting the
>> query to use the regular DISTINCT it suddenly switches to
>> HashAggregate with ~1/3 the cost (although it produces the same
>> output, AFAIK), and it executes in ~15 seconds.
> 
> [ scratches head... ]  I guess you're running into some corner case
> where choose_hashed_grouping and choose_hashed_distinct make
> different choices. It's going to be tough to debug without a test
> case though.  I couldn't reproduce the behavior in a few tries here.
> 
>> BTW I can't test this on 9.2 or 9.3 easily, as this is our
>> production environment and I can't just export the data. I've tried
>> to simulate this but so far no luck.
> 
> I suppose they won't yet you step through those two functions with a 
> debugger either ...

OK, this time the complete message ...

I've managed to get the data to a different machine, and I've spent some
time on debugging it. It seems that the difference is in evaluating
hashentrysize - while choose_hashed_distinct does this:

  /*
   * Don't do it if it doesn't look like the hashtable will fit into
   * work_mem.
   */
  hashentrysize = MAXALIGN(path_width)
                + MAXALIGN(sizeof(MinimalTupleData));

  if (hashentrysize * dNumDistinctRows > work_mem * 1024L)
      return false;

while choose_hashed_grouping does this:

  /* Estimate per-hash-entry space at tuple width... */
  hashentrysize = MAXALIGN(path_width)
                + MAXALIGN(sizeof(MinimalTupleData));
  /* plus space for pass-by-ref transition values... */
  hashentrysize += agg_costs->transitionSpace;
  /* plus the per-hash-entry overhead */
  hashentrysize += hash_agg_entry_size(agg_costs->numAggs);

  if (hashentrysize * dNumGroups > work_mem * 1024L)
    return false;

In both cases the common parameter values are

  dNumGroups = dNumDistinctRows = 20451018
  work_mem = 819200

but the hashentrysize size is 24 (choose_hashed_distinct) or 56
(choose_hashed_grouping). This causes that while _distinct evaluates the
condition as false, and _grouping as true (and thus returns false).

Now, the difference between 24 and 56 is caused by hash_agg_entry_size.
It's called with numAggs=0 but returns 32. I'm wondering if it should
return 0 in such cases, i.e. something like this:

    Size
    hash_agg_entry_size(int numAggs)
    {
        Size        entrysize;

        if (numAggs == 0)
            return 0;

        /* This must match build_hash_table */
        entrysize = sizeof(AggHashEntryData) +
            (numAggs - 1) * sizeof(AggStatePerGroupData);
        entrysize = MAXALIGN(entrysize);
        /* Account for hashtable overhead (assuming fill factor = 1) */
        entrysize += 3 * sizeof(void *);
        return entrysize;
    }

I've tested that after this both queries use HashAggregate (which is the
right choice), but I haven't done any extensive checking so maybe I'm
missing something.

Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance