Re: work_mem greater than 2GB issue

wickro <robwickert@xxxxxxxxx> · Thu, 14 May 2009 08:47:24 -0700 (PDT)

You're right. At a certain work_mem threshold it switches over to a
HashAggregate sort method.
When it does, it eats up alot of memory. For GroupAggregate it only
uses the max of work_mem.

I'm using Postgresql 8.3.3 64bit on Centos 5.
The query I'm running is:

select keyword, partner_id, sum(num_searches) as num_searches
	from partner_country_keywords
	group by partner_id, keyword

against

create table partner_country_keywords (
    keyword varchar,
    partner_id integer,
    country char(2),
    num_searches integer
);

The table partner_country_keywords is 8197 MB large according to
pg_class with 126,171,000 records.

EXPLAIN at work_mem = 2GB:

HashAggregate  (cost=3257160.40..3414874.00 rows=12617088 width=28)
  ->  Seq Scan on partner_country_keywords  (cost=0.00..2310878.80
rows=126170880 width=28)

This will continue to eat memory to about 7.2GB or so (on a 8GB
machine)

EXPLAIN at work_mem = 1300MB:

"GroupAggregate  (cost=22306417.24..23725839.64 rows=12617088
width=28)"
"  ->  Sort  (cost=22306417.24..22621844.44 rows=126170880 width=28)"
"        Sort Key: partner_id, keyword"
"        ->  Seq Scan on partner_country_keyword
(cost=0.00..2310878.80 rows=126170880 width=28)"

So this is a planning mistake? Should a hash be allowed to grow larger
than work_mem before it starts to use the disk?

On May 14, 4:11 pm, st...@xxxxxxxxxxxxxxxx (Gregory Stark) wrote:
> wickro <robwick...@xxxxxxxxx> writes:
> > Hi everyone,
>
> > I have a largish table (> 8GB). I'm doing a very simple single group
> > by on. I am the only user of this database. If I set work mem to
> > anything under 2GB (e.g. 1900MB) the postmaster process stops at that
> > value while it's peforming it's group by. There is only one hash
> > operation so that is what I would expect. But anything larger and it
> > eats up all memory until it can't get anymore (around 7.5GB on a 8GB
> > machine). Has anyone experienced anything of this sort before.
>
> What does EXPLAIN say for both cases? I suspect what's happening is that the
> planner is estimating it will need 2G to has all the values and in fact it
> would need >8G. So for values under 2G it uses a sort and not a hash at all,
> for values over 2G it's trying to use a hash and failing.
>
> --
>   Gregory Stark
>   EnterpriseDB          http://www.enterprisedb.com
>   Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!
>
> --
> Sent via pgsql-general mailing list (pgsql-gene...@xxxxxxxxxxxxxx)
> To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-general

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general