Re: postmaster consuming /lots/ of memory with hash aggregate. why?

Jon Nelson <jnelson+pgsql@xxxxxxxxxxx> · Thu, 11 Nov 2010 20:30:56 -0600

On Fri, Nov 5, 2010 at 7:26 PM, Jon Nelson <jnelson+pgsql@xxxxxxxxxxx> wrote:
> I have a postgres 8.4.5 instance on CentOS 5 (x86_64) which appears to
> go crazy with the amount of memory it consumes.
> When I run the query below, in a matter of a few seconds memory
> balloons to 5.3G (virtual), 4.6G (resident) and 1840 (shared), and
> eventually the oom killer is invoked, killing the entire process.
>
> Physical memory is 8GB but other processes on the box consume
> approximately 4GB of that.
>
> The settings changed from their defaults:
>
> effective_cache_size = 4GB
> work_mem = 16MB
> maintenance_work_mem = 128MB
> wal_buffers = 16MB
> checkpoint_segments = 16
> shared_buffers = 384MB
> checkpoint_segments = 64
>
> and
>
> default_statistics_target = 100
>
> The query is this:
>
> insert into d_2010_09_13_sum
> Â Â Â Âselect FOO.i, FOO.n, sum(FOO.cc) as cc, sum(FOO.oc) as oc
> Â Â Â Âfrom (
> Â Â Â Â Âselect * from d_2010_09_12_sum
> Â Â Â Â Âunion all
> Â Â Â Â Âselect * from d_2010_09_13
> Â Â Â Â) AS FOO group by i, n;
>
> here is the explain:
>
> ÂSubquery Scan "*SELECT*" Â(cost=1200132.06..1201332.06 rows=40000 width=80)
> Â -> ÂHashAggregate Â(cost=1200132.06..1200732.06 rows=40000 width=41)
> Â Â Â Â -> ÂAppend Â(cost=0.00..786531.53 rows=41360053 width=41)
> Â Â Â Â Â Â Â -> ÂSeq Scan on d_2010_09_12_sum Â(cost=0.00..520066.48
> rows=27272648 width=42)
> Â Â Â Â Â Â Â -> ÂSeq Scan on d_2010_09_13 Â(cost=0.00..266465.05
> rows=14087405 width=40)
>
> Both source tables freshly vacuum analyze'd.
> The row estimates are correct for both source tables.
>
> If I use "set enable_hashagg = false" I get this plan:
>
> ÂSubquery Scan "*SELECT*" Â(cost=8563632.73..9081838.25 rows=40000 width=80)
> Â -> ÂGroupAggregate Â(cost=8563632.73..9081238.25 rows=40000 width=41)
> Â Â Â Â -> ÂSort Â(cost=8563632.73..8667033.84 rows=41360441 width=41)
> Â Â Â Â Â Â Â Sort Key: d_2010_09_12_sum.i, d_2010_09_12_sum.n
> Â Â Â Â Â Â Â -> ÂResult Â(cost=0.00..786535.41 rows=41360441 width=41)
> Â Â Â Â Â Â Â Â Â Â -> ÂAppend Â(cost=0.00..786535.41 rows=41360441 width=41)
> Â Â Â Â Â Â Â Â Â Â Â Â Â -> ÂSeq Scan on d_2010_09_12_sum
> (cost=0.00..520062.04 rows=27272204 width=42)
> Â Â Â Â Â Â Â Â Â Â Â Â Â -> ÂSeq Scan on d_2010_09_13
> (cost=0.00..266473.37 rows=14088237 width=40)
>
> and postmaster's memory never exceeds (roughly) 548M (virtual), 27M
> (resident), 5M (shared).
>
> I even set default_statistics_target to 1000 and re-ran "vacuum
> analyze verbose" on both tables - no change.
> If I set work_mem to 1MB (from 16MB) then the GroupAggregate variation
> is chosen instead.
> Experimentally, HashAggregate is chosen when work_mem is 16MB, 8MB,
> 6MB, 5MB but not 4MB and on down.
>
> Two things I don't understand:
>
> 1. Why, when hash aggregation is allowed, does memory absolutely
> explode (eventually invoking the wrath of the oom killer). 16MB for
> work_mem does not seem outrageously high. For that matter, neither
> does 5MB.
>
> 2. Why do both HashAggregate and GroupAggregate say the cost estimate
> is 40000 rows?

Unfortunately, I've found that as my database size grows, I've
generally had to disable hash aggregates for fear of even simple
seeming queries running out of memory, even with work_mem = 1MB.

In some cases I saw memory usage (with hashagg) grow to well over 5GB
and with group aggregate it barely moves.  Am *I* doing something
wrong? Some of these queries are on partitioned tables (typically
querying the parent) and the resulting UNION or UNION ALL really
starts to hurt, and when the server runs out of memory and kills of
the postmaster process a few minutes or even hours into the query it
doesn't make anybody very happy.

Is there some setting I can turn on to look to see when memory is
being allocated (and, apparently, not deallocated)?

The latest query has a HashAggregate that looks like this:
HashAggregate  (cost=19950525.30..19951025.30 rows=40000 width=37)
but there are, in reality, approximately 200 million rows (when I run
the query with GroupAggregate, that's what I get).

Why does it keep choosing 40,000 rows?

I suppose I could use the newly-learned ALTER USER trick to disable
hash aggregation for the primary user, because disabling hash
aggregation system-wide sounds fairly drastic. However, if I *don't*
disable it, the query quickly balloons memory usage to the point where
the process is killed off.

-- 
Jon

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance