Hi Rik > > Sounds like a job for memory limits (ulimit?), not for OOM > > notification, right? > > I suspect one problem could be that an HPC job scheduling program > does not know exactly how much memory each job can take, so it can > sometimes end up making a mistake and overcommitting the memory on > one HPC node. > > In that case the user is better off having that job killed and > restarted elsewhere, than having all of the jobs on that node > crawl to a halt due to swapping. > > Paul, is this guess correct? :) Yes. Fujitsu HPC middleware watching sum of memory consumption of the job and, if over-consumption happened, kill process and remove job schedule. I think that is common hpc requirement. but we watching to user defined memory limit, not swap. Thanks. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html