On Sat 18-04-20 11:13:11, Yafang Shao wrote: > Without considering the manually triggered OOM, if no victim found in > system OOM, the system will be deadlocked on memory, however if no > victim found in memcg OOM, it can charge successfully and runs well. > This behavior in memcg oom is not proper because that can prevent the > memcg from being limited. > > Take an easy example. > $ cd /sys/fs/cgroup/foo/ > $ echo $$ > cgroup.procs > $ echo 200M > memory.max > $ cat memory.max > 209715200 > $ echo -1000 > /proc/$$/oom_score_adj > Then, let's run a memhog task in memcg foo, which will allocate 1G > memory and keeps running. > $ /home/yafang/test/memhog & Well, echo -1000 is a privileged operation. And it has to be used with an extreme care because you know that you are creating an unkillable task. So the above test is a clear example of the misconfiguration. > Then memory.current will be greater than memory.max. Run bellow command > in another shell. > $ cat /sys/fs/cgroup/foo/memory.current > 1097228288 > The tasks which have already allocated memory and won't allocate new > memory still runs well. This behavior makes nonsense. > > This patch is to improve it. > If no victim found in memcg oom, we should force the current task to > wait until there's available pages. That is similar with the behavior in > memcg1 when oom_kill_disable is set. The primary reason why we force the charge is because we _cannot_ wait indefinitely in the charge path because the current call chain might hold locks or other resources which could block a large part of the system. You are essentially reintroducing that behavior. Is the above example a real usecase or you have just tried a test case that would trigger the problem? -- Michal Hocko SUSE Labs