RE: Postgres12 looking for possible HashAggregate issue workarounds?

João Paulo Luís <joao.luis@xxxxxxxxx> · Fri, 16 Dec 2022 17:47:07 +0000

Thank you. It seems it is precisely that problem.

(I will discuss with the rest of the team upgrade possibilities, as I guess it will never be backported
 to the bugfixes of version 12.)

Meanwhile, as a one-time workaround I've disabled the hashagg algorithm,

SET enable_hashagg=off;

repeated the query, and it finished in 1h28m (and the RAM resident memory stayed
 just a little above the 16GB of shared_buffers).

Happy holidays!

João Luís
Senior Developer
joao.luis@xxxxxxxxx 
+351 210 337 700

Confidentiality
The information in this message is confidential and privileged. It is intended solely for the addressee. If you are not the intended recipient, any disclosure, copying, or distribution of the message, or any action or omission taken by you in reliance on it is prohibited. 
Please contact the sender immediately if you have received this message by mistake.
Thank you for your cooperation.

De: Justin Pryzby <pryzby@xxxxxxxxxxxxx>

Enviado: 16 de dezembro de 2022 16:06

Para: João Paulo Luís <joao.luis@xxxxxxxxx>

Cc: pgsql-performance@xxxxxxxxxxxxxxxxxxxx <pgsql-performance@xxxxxxxxxxxxxxxxxxxx>

Assunto: Re: Postgres12 looking for possible HashAggregate issue workarounds?

[Não costuma receber e-mails de pryzby@xxxxxxxxxxxxx. Saiba por que motivo isto é importante em
https://aka.ms/LearnAboutSenderIdentification. ]

CAUTION: External E-mail

On Fri, Dec 16, 2022 at 03:24:17PM +0000, João Paulo Luís wrote:

> Hi! Sorry to post to this mailing list, but I could not find many tips working around HashAggregate issues.

>

> In a research project involving text repetition analysis (on top of public documents)

> I have a VirtualMachine (CPU AMD Epyc 7502P, 128GB RAM, 12TB HDD, 2TB SSD),

> running postgres 12.12 (Ubuntu 12.12-0ubuntu0.20.04.1)

> and some tables with many rows:

> 1 - the query is making a postgresql project have 76.7 GB resident RAM usage.

> Having a WORK_MEM setting of 2GB (and "simple" COUNT() results),

> that was not expected.

> (I risk oom-killer killing my postgres as soon as I run another concurrent

> query.)

> The rows=261275 on HashAggregate  (cost=26397219.92..26399832.67 rows=261275 width=8) seems VERY WRONG!

> I was expecting something like rows=1.0E+09 instead.

> I would guess that HashAggregate is behaving very badly (using to much RAM beyond WORK_MEM, amd also badly estimating the #rows and taking forever...)

Huge memory use sounds like what was fixed in postgres 13.

https://www.postgresql.org/docs/13/release-13.html

Allow hash aggregation to use disk storage for large aggregation result

sets (Jeff Davis)

Previously, hash aggregation was avoided if it was expected to use more

than work_mem memory. Now, a hash aggregation plan can be chosen despite

that. The hash table will be spilled to disk if it exceeds work_mem

times hash_mem_multiplier.

This behavior is normally preferable to the old behavior, in which once

hash aggregation had been chosen, the hash table would be kept in memory

no matter how large it got — which could be very large if the planner

had misestimated. If necessary, behavior similar to that can be obtained

by increasing hash_mem_multiplier.

--

Justin