On Wed 15-07-20 10:30:51, David Rientjes wrote: [...] > I don't think moving the mem_cgroup_margin() check to out_of_memory() > right before printing the oom info and killing the process is a very > invasive patch. Any strong preference against doing it that way? I think > moving the check as late as possible to save a process from being killed > when racing with an exiter or killed process (including perhaps current) > has a pretty clear motivation. We have been through this discussion several times in the past IIRC The conclusion has been that the allocator (charging path for the memcg) is the one to define OOM situation. This is an inherently racy situation as long as we are not synchronizing oom with the world, which I believe we agree, we do not want to do. There are few exceptions to bail out early from the oom under certain situations and the trend was to remove some of the existing ones rather than adding new because they had subtle side effects and were prone to lockups. As much as it might sound attractive to move mem_cgroup_margin resp. last allocation attempt closer to the actual oom killing I haven't seen any convincing data that would support that such a change would make a big difference. select_bad_process is not a free operation as it scales with the number of tasks in the oom domain but it shouldn't be a super expensive. The oom reporting is by far the most expensive part of the operation. That being said, really convincing data should be presented in order to do such a change. I do not think we want to do that just in case. -- Michal Hocko SUSE Labs