I wonder if this is related to the Linux memory overcommit problem: http://www.postgresql.org/docs/current/interactive/kernel-resources.html #AEN19361 -- Brandon Aiken CS/IT Systems Engineer -----Original Message----- From: pgsql-general-owner@xxxxxxxxxxxxxx [mailto:pgsql-general-owner@xxxxxxxxxxxxxx] On Behalf Of Angva Sent: Wednesday, December 13, 2006 4:49 PM To: pgsql-general@xxxxxxxxxxxxxx Subject: [GENERAL] out of memory woes Hi everyone, First, this group has been good to me, and I thank you guys for the valuable help I've found here. I come seeking help with another problem. I am not even sure my problem lies in Postgres, but perhaps someone here has had a similar problem and could point me in the right direction. As I've mentioned in a few other posts, I run a daily job that loads large amounts of data into a Postgres database. It must run efficiently, so one of the tricks I do is run table loads, and commands such as cluster, in parallel. I am having a problem where my cluster job consistently fails with an out of memory error. It did not always do this, but at some point a memory leak or "something" built up to the point where clustering would always fail on one of the two largest tables. That is, four tables are clustered in parallel. The smaller of the two finish successfully. The remaining two - the largest - run for several minutes. Then one of the tables - not always the same one - gets an out of memory error and fails. So, suspecting a memory leak, I tried bouncing Postgres, and ran the clusters again. No luck - failed in the same manner. I don't know if it means anything, but swap never seems to be used by the postgres processes (I stared at top and vmstat for a while), though the VIRT column in top definitely shows something. Here are sample lines from top while two processed are running: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 30692 postgres 25 0 1033m 854m 459m R 100 14.3 1:10.68 postgres: secmaster dev_stage [local] SELECT 30691 postgres 23 0 1752m 1.6g 459m R 100 27.5 2:55.60 postgres: secmaster dev_stage [local] SELECT (You see SELECT instead of CLUSTER because I wrapped up my cluster commands in functions. I call them from psql by selecting the function.) Sample vmstat output: procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 3 2 724 18392 5052 5599624 0 0 915 2352 5 3 11 2 80 8 It seems that the postgres processes do not want to use swap -- swap never increases as the processes run. Again I am not sure whether this is significant. If I run the clusters sequentially, there is no error - they just take too long. I was out of ideas so I bounced the server entirely and ran the clusters in parallel a few times - success! But I don't want to have to bounce the server regularly. So, I'm not sure what to do. I need to figure out why the server would deteriorate in such a way. Any advice that can be offered would be much appreciated. I can provide any additional information that might be necessary. I am running Postgres 8.1.2 on CentOS 4.4 64-bit. Thanks a lot, Mark ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq