On Thu, 11 Jul 2024 at 13:19, Dimitrios Apostolou <jimis@xxxxxxx> wrote: > I have a table with 1000 partitions on PostgreSQL 16. > I notice that a fairly complicated query of the form: > > SELECT ... GROUP BY ... LIMIT ... > > causes the postgres backend process to grow insanely very fast, and the > kernel OOM killer to kill it rather soon. > It seems it tries to allocate at least 1000 * work_mem. > -> Append > -> HashAggregate > -> Seq Scan > -> ... 1000 more hashagg+seqscans > > > Is this allocation pattern (workmem * n_partitions) expected under any > scenario? I can't find it documented. AFAIU the backend should allocate > up to (depth_of_execution_plan * work_mem) (putting aside the > hash_mem_multiplier and the parallel workers). Not depth of execution plan. It relates to the number of nodes in the plan which allocate work_mem or work_mem * hash_mem_multiplier. There is some documentation in [1]: "Note that a complex query might perform several sort and hash operations at the same time" Also, see the warning about execution time memory in [2]. > NOTE: after having written the above message, it occured to me that I have > enable_partitionwise_aggregate=on. And Turning it off fixes the issue and > makes the query faster too! Expected behaviour or bug? enable_partitionwise_aggregate=on causes this query to perform an aggregate per partition. If your GROUP BY clause values are distributed evenly throughout all partitions then you might find it's not much slower to execute the query with enable_partitionwise_aggregate=off. It's understandable that how PostgreSQL uses work_mem isn't ideal here, but unfortunately, that's the state of affairs, currently. You might want to reconsider your enable_partitionwise_aggregate setting and/or how many partitions you have. David [1] https://www.postgresql.org/docs/current/runtime-config-resource.html [2] https://www.postgresql.org/docs/current/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE-BEST-PRACTICES