On Thu, Jan 22, 2009 at 02:00:55AM -0800, David Rientjes (rientjes@xxxxxxxxxx) wrote: > > In an exclusive cpuset, a task's memory is restricted to a set of mems > that the administrator has designated. If it is oom, the kernel must free > memory on those nodes or the next allocation will again trigger an oom > (leading to a needlessly killed task that was in a disjoint cpuset). > > Really. The whole point of oom-killer is to kill the most appropriate task to free the memory. And while task is selected system-wide and some tunables are added to tweak the behaviour local to some subsystems, this cpuset feature is hardcoded into the selection algorithm. And when some tunable starts doing own calculation, behaviour of this hardcoded feature changes. This is intended to change it. Because admin has to have ability to tune system the way he needs and not some special hueristics, which may not work all the time. That is the point against cpuset argument. Make it tunable the same way we have oom_adj and/or this cgroup order feature. > > In this case administrator will not do this. It is up to him to decide > > and not some inner kernel policy. > > > > Then the scope of this new cgroup is restricted to not being used with > cpusets that could oom. These are perpendicular tasks - cpusets limit one area of the oom handling, cgroup order - another. Some people needs cpusets, others want cgroups. cpusets are not something exceptional so that only they have to be taken into account when doing system-wide operation like OOM condition handling. -- Evgeniy Polyakov _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers