On Tue, Mar 9, 2021 at 8:13 PM Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
Radoslav Nedyalkov <rnedyalkov@xxxxxxxxx> writes:
> On Tue, Mar 9, 2021 at 6:53 PM Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
>> The only thing I'm aware of that could consume unbounded memory
>> on the server side is hash aggregation. (v13 has improved that
>> situation, but I'm guessing you are running some older version.)
>> The planner won't choose hash aggregation if it estimates that
>> the hash table would need to exceed work_mem, but its estimate
>> could be wrong.
> Query and plan attached.
Yeah, so, this looks suspicious:
-> HashAggregate (cost=181320662.52..181323190.52 rows=252800 width=16)
Group Key: gsm.merchant_id, cer.date
-> Merge Left Join (cost=161250580.17..170174828.82 rows=891666696 width=71)
How accurate is that estimate for the number of merchant_id/date
groups? Is the estimate for the size of the input join reasonable?
Assuming this is the problem, a brute-force fix could be to turn off
enable_hashagg. But it'd be better to try to get the planner's
estimates more in line with reality.
regards, tom lane
Merely taking the number of rows from cross join inputs, N*M gives 1.4B rows + some multiplication from left join,
that might by the "surprising" part, rows seem to be underestimated. Team already went to rewrite the query.
Thanks for pointing out HashAggregate as a potential danger.
Thanks and regards,
Rado