Re: Extremely slow HashAggregate in simple UNION query

Felix Geisendörfer <felix@xxxxxxxxxx> · Tue, 20 Aug 2019 19:55:56 +0200

Hi,

> On 20. Aug 2019, at 19:32, Andres Freund <andres@xxxxxxxxxxx> wrote:
> 
> Hi,
> 
> On 2019-08-20 17:11:58 +0200, Felix Geisendörfer wrote:
>> 
>> HashAggregate  (cost=80020.01..100020.01 rows=2000000 width=8) (actual time=19.349..23.123 rows=1 loops=1)
> 
> FWIW, that's not a mis-estimate I'm getting on master ;).  Obviously
> that doesn't actually address your concern...

I suppose this is thanks to the new optimizer support functions
mentioned by Michael and Pavel?

Of course I'm very excited about those improvements, but yeah, my 
real query is mis-estimating things for totally different reasons not
involving any SRFs.

>> I'm certainly a novice when it comes to PostgreSQL internals, but I'm
>> wondering if this could be fixed by taking a more dynamic approach for
>> allocating HashAggregate hash tables?
> 
> Under-sizing the hashtable just out of caution will have add overhead to
> a lot more common cases. That requires copying data around during
> growth, which is far far from free. Or you can use hashtables that don't
> need to copy, but they're also considerably slower in the more common
> cases.

How does PostgreSQL currently handle the case where the initial hash
table is under-sized due to the planner having underestimated things?
Are the growth costs getting amortized by using an exponential growth
function?

Anyway, I can accept my situation to be an edge case that doesn't justify
making things more complicated.

>> 3. Somehow EXPLAIN gets confused by this and only ends up tracking 23ms of the query execution instead of 45ms [5].
> 
> Well, there's plenty work that's not attributed to nodes. IIRC we don't
> track executor startup/shutdown overhead on a per-node basis.

I didn't know that, thanks for clarifying : ).