memory usage of group by select

Anthony <osm@xxxxxxxxx> · Tue, 29 Dec 2009 15:41:21 -0500

Hi all,

I'm running a group by query on a table with over a billion rows and my memory usage is seemingly growing without bounds.  Eventually the mem usage exceeds my physical memory and everything starts swapping.  Here is what I gather to be the relevant info:

My machine has 768 megs of ram.

shared_buffers = 128MB
work_mem = 8MB # this was originally higher, but I brought it down to try to fix the problem - it hasn't
maintenance_work_mem = 256MB
fsync = off

checkpoint_segments = 30
effective_cache_size = 256MB #this was originally 512MB but I just recently brought it down - as I expected that didn't affect anything

data=# explain select pid, min(oid) into nd_min from nd group by pid;

                               QUERY PLAN
------------------------------------------------------------------------
 HashAggregate  (cost=28173891.00..28174955.26 rows=85141 width=8)
   ->  Seq Scan on nd  (cost=0.00..21270355.00 rows=1380707200 width=8)

(2 rows)

data=# \d+ nd
            Table "fullplanet091207osm.nd"
 Column |  Type   | Modifiers | Storage | Description
--------+---------+-----------+---------+-------------
 oid    | integer | not null  | plain   |

 pid    | integer | not null  | plain   |
 ref    | integer |           | plain   |
Indexes:
    "nd_pkey" PRIMARY KEY, btree (pid, oid)
Has OIDs: no

VERSION = 'PostgreSQL 8.4.1 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu 4.4.1-3ubuntu3) 4.4.1, 64-bit'