Re: out of memory

Craig White <craigwhite@xxxxxxxxxxx> · Wed, 30 Jul 2008 19:55:17 -0700

On Wed, 2008-07-30 at 22:19 -0400, Filipe Brandenburger wrote:
> On Wed, Jul 30, 2008 at 20:31, Craig White <craigwhite@xxxxxxxxxxx> wrote:
> > how does one determine who the culprit was?
> 
> Very hard... the kernel tries to "guess" which process is causing the
> issue, but from what I've seen (and I see OOMs every week) it guesses
> wrong most of the time. In my case, the victim ends up being "nscd"
> most of the time, even when I'm sure it's not using a lot of memory
> nor leaking.
> 
> In my case, usually when I start having OOMs I have them on several
> machines running the same programs (it's a grid) so it's more or less
> easy to find the culprit by looking at the jobs that were running on
> all affected machines.
> 
> In any case, my policy is to always reboot a machine after an OOM,
> since it may be in an incoherent state.
----
well, I stopped using nscd a few years ago and it definitely is off
after the reboot and chkconfig says it shouldn't start by itself but I
put it in the realm of possible but unlikely.

I did update to 5.2 on Sunday and updated nss-ldap yesterday and today -
boink though I have no way to know what actually caused this as the logs
don't reveal enough as far as I can tell. The system has been up for
quite some time.

I suppose I could run some type of cron script that does something
like...

top -n 1 -b >> /tmp/top.log

so if it happens again, I get a memory snapshot history...is there a
better idea?

Craig

_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos