On Wed, 2008-07-30 at 22:19 -0400, Filipe Brandenburger wrote: > On Wed, Jul 30, 2008 at 20:31, Craig White <craigwhite@xxxxxxxxxxx> wrote: > > how does one determine who the culprit was? > > Very hard... the kernel tries to "guess" which process is causing the > issue, but from what I've seen (and I see OOMs every week) it guesses > wrong most of the time. In my case, the victim ends up being "nscd" > most of the time, even when I'm sure it's not using a lot of memory > nor leaking. > > In my case, usually when I start having OOMs I have them on several > machines running the same programs (it's a grid) so it's more or less > easy to find the culprit by looking at the jobs that were running on > all affected machines. > > In any case, my policy is to always reboot a machine after an OOM, > since it may be in an incoherent state. ---- well, I stopped using nscd a few years ago and it definitely is off after the reboot and chkconfig says it shouldn't start by itself but I put it in the realm of possible but unlikely. I did update to 5.2 on Sunday and updated nss-ldap yesterday and today - boink though I have no way to know what actually caused this as the logs don't reveal enough as far as I can tell. The system has been up for quite some time. I suppose I could run some type of cron script that does something like... top -n 1 -b >> /tmp/top.log so if it happens again, I get a memory snapshot history...is there a better idea? Craig _______________________________________________ CentOS mailing list CentOS@xxxxxxxxxx http://lists.centos.org/mailman/listinfo/centos