* Matt Magoffin (postgresql.org@xxxxxxx) wrote: > > Does the result from 'free' look reasonable on this box? > > I think so: > > total used free shared buffers cached > Mem: 16432296 16273964 158332 0 173536 14321340 > -/+ buffers/cache: 1779088 14653208 > Swap: 2096440 560 2095880 That certainly looks fine.. And you've got 14G or so which should be available for this query. Was this near the time the query was running? Could you give us what 'free' returns when the query is close to the out-of-memory error? I'd expect the 2nd row under 'free' to be getting low for the allocation to fail. > Just running top, it does appear to chew through a fair amount of memory. > Here's a snapshot from top of the postgres processing running this query > from just before it ran out of memory: > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 4486 postgres 18 0 4576m 3.6g 3.3g R 90 23.1 0:34.23 postgres: > postgres lms_nna [local] EXPLAIN > > These values did start out low, for example the RES memory started in the > 130MB range, then climbed to the 3.6GB you see here. Uhh.. I saw that your system was 64-bit, but is your PG process compiled as 64bit? Maybe you're hitting an artificial 32-bit limit, which isn't exactly helped by your shared_buffers being set up so high to begin with? Run 'file' on your postgres binary, like so: sfrost@snowman:/home/sfrost> file /usr/lib/postgresql/8.3/bin/postgres /usr/lib/postgresql/8.3/bin/postgres: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.8, stripped (that's on Debian, your postgres binary will probably be somewhere else). > I set the work_mem to 2MB, and the query does actually complete (explain > analyze output below), so does this mean that the query simply uses too > much memory for sorting/joining, and we'd have to either allocate enough > work_mem to allow the query to complete, or a smaller work_mem as shown > here to make the query use slower disk-based sorting? The row counts are > matching what we'd expect from this query. Allocating more work_mem won't help in this situation. PG's out-of-memory error is only ever caused by an actual allocation failure. The work_mem, to PG, is more of a guideline than any kind of hard limit. Based on the explain analyze, I continue to feel that your query isn't actually all that big in terms of memory usage (the disk-based sorting taking place was in the 50M range or so, from what I saw, and you should have plenty of memory for such a query..). Check your binary.. I suspect that's where your issue is. Now, I don't recall offhand if you can just recompile-in-place or if you need to do a dump/restore, but I'd definitely do a backup just in case and keep it handy (or maybe use it just to be safe) and see if PG complains when it's started up. Thanks, Stephen
Attachment:
signature.asc
Description: Digital signature