Timothy Garnett <tgarnett@xxxxxxxxxxx> writes: > I have a query that's pulling data for another system using COPY (query) to > STDOUT CSV on a 9.2.4 db (we're in the process of upgrading to 9.3). > ... > We're running into problems with the machine running out of memory with > this single query process consuming over 100GB resident memory before the > machine exhausts swap and the Linux OOM handling eventually kills it. I wonder if you're hitting some sort of memory leak. What I'd suggest doing to help diagnose that is to show us a memory map. Do this: (1) Set a ulimit so that the process will get ENOMEM sometime before the OOM killer awakens (this is good practice anyway, if you've not disabled OOM kills). On Linux systems, ulimit -m or -v generally does the trick. The easiest way to enforce this is to add a ulimit command to the script that launches the postmaster, then restart. (2) Make sure your logging setup will collect anything printed to stderr by a backend. If you use logging_collector you're good to go; if you use syslog you need to check where the postmaster's stderr was redirected, making sure it's not /dev/null. (3) Run the failing query. Collect the memory map it dumps to stderr when it fails, and send it in. What you're looking for is a couple hundred lines looking like this: TopMemoryContext: 69984 total in 10 blocks; 6152 free (16 chunks); 63832 used MessageContext: 8192 total in 1 blocks; 7112 free (1 chunks); 1080 used Operator class cache: 8192 total in 1 blocks; 1640 free (0 chunks); 6552 used smgr relation table: 24576 total in 2 blocks; 13872 free (3 chunks); 10704 used ... lots more in the same vein ... > As to the right join (used for a few of the joins, most are left join or > merge): > -> Hash Right Join (cost=225541299.19..237399743.38 > rows=86681834 width=1108) > Hash Cond: (xxx.xxx = yyy.yyy) > -> Seq Scan on xxx (cost=0.00..6188.18 > rows=9941 width=20) > Filter: (mode = 'live'::text) > -> Hash (cost=212606744.27..212606744.27 > rows=86681834 width=1096) > .... > I'm not sure if I'm reading it right, but it looks like it's hashing the 86 > million row set and scanning over the 10k row set which seems to me like > the opposite of what you'd want to do, but I haven't seen a lot of hash > right joins in plans and I'm not sure if that's how it works. That looks pretty odd to me too, though I guess the planner might think it was sensible if xxx's join column had very low cardinality. Still, it's weird. What have you got work_mem set to exactly? regards, tom lane