Re: out of memory woes

"Brandon Aiken" <BAiken@xxxxxxxxxxxxxxx> · Thu, 14 Dec 2006 11:41:57 -0500

I wonder if this is related to the Linux memory overcommit problem:

http://www.postgresql.org/docs/current/interactive/kernel-resources.html
#AEN19361

--
Brandon Aiken
CS/IT Systems Engineer

-----Original Message-----
From: pgsql-general-owner@xxxxxxxxxxxxxx
[mailto:pgsql-general-owner@xxxxxxxxxxxxxx] On Behalf Of Angva
Sent: Wednesday, December 13, 2006 4:49 PM
To: pgsql-general@xxxxxxxxxxxxxx
Subject: [GENERAL] out of memory woes

Hi everyone,

First, this group has been good to me, and I thank you guys for the
valuable help I've found here. I come seeking help with another
problem. I am not even sure my problem lies in Postgres, but perhaps
someone here has had a similar problem and could point me in the right
direction.

As I've mentioned in a few other posts, I run a daily job that loads
large amounts of data into a Postgres database. It must run
efficiently, so one of the tricks I do is run table loads, and commands
such as cluster, in parallel. I am having a problem where my cluster
job consistently fails with an out of memory error. It did not always
do this, but at some point a memory leak or "something" built up to the
point where clustering would always fail on one of the two largest
tables. That is, four tables are clustered in parallel. The smaller of
the two finish successfully. The remaining two - the largest - run for
several minutes. Then one of the tables - not always the same one -
gets an out of memory error and fails.

So, suspecting a memory leak, I tried bouncing Postgres, and ran the
clusters again. No luck - failed in the same manner.

I don't know if it means anything, but swap never seems to be used by
the postgres processes (I stared at top and vmstat for a while), though
the VIRT column in top definitely shows something. Here are sample
lines from top while two processed are running:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
30692 postgres  25   0 1033m 854m 459m R  100 14.3   1:10.68 postgres:
secmaster dev_stage [local] SELECT
30691 postgres  23   0 1752m 1.6g 459m R  100 27.5   2:55.60 postgres:
secmaster dev_stage [local] SELECT

(You see SELECT instead of CLUSTER because I wrapped up my cluster
commands in functions. I call them from psql by selecting the
function.)

Sample vmstat output:

procs -----------memory---------- ---swap-- -----io---- --system--
----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us
sy id wa
 3  2    724  18392   5052 5599624    0    0   915  2352    5     3 11
2 80  8

It seems that the postgres processes do not want to use swap -- swap
never increases as the processes run. Again I am not sure whether this
is significant.

If I run the clusters sequentially, there is no error - they just take
too long. I was out of ideas so I bounced the server entirely and ran
the clusters in parallel a few times - success! But I don't want to
have to bounce the server regularly.

So, I'm not sure what to do. I need to figure out why the server would
deteriorate in such a way. Any advice that can be offered would be much
appreciated. I can provide any additional information that might be
necessary.

I am running Postgres 8.1.2 on CentOS 4.4 64-bit.

Thanks a lot,
Mark

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq