Re: Manual OOM killing?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]



Alan McKay wrote:
> Hey guys and gals,
>
> Yesterday I had one of my scientists kill one of my servers when his
> program ran amok and gobbled up all the memory, or forked too many
> processes, or I'm just not exactly sure what to be honest.
>
> Is there something I can run manually in cron to look for rampant
> programs and kill them?   I know that may be hard to discern but I
> could also include a list if "known good" programs not to kill, as
> well as a list of "known suspect" user IDs
>
> Anyone ever done this?  Searching the list on "OOM" does not bring up
> much.

Yeah, we've had that a few times, even on 64 core systems with a
ridiculous amount of memory. One thing we did was to tell them to limit
the number of cores they were using in the parallel processing threads.
There is some kind of limit you can set up - I forget exactly what it is,
but I'm not sure it will limit memory usage.

Someone suggested yesterday, in another context, giving 'em a VM of their
own. Doing that, you can limit how many cores and how much memory they
have, and then if they crash, it's more their problem than yours.

Or have them buy another server and make it a cluster. We use the torque
package.

      mark

_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos


[Index of Archives]     [CentOS]     [CentOS Announce]     [CentOS Development]     [CentOS ARM Devel]     [CentOS Docs]     [CentOS Virtualization]     [Carrier Grade Linux]     [Linux Media]     [Asterisk]     [DCCP]     [Netdev]     [Xorg]     [Linux USB]
  Powered by Linux