On Tue 16-11-21 10:28:25, Michal Hocko wrote: > On Mon 15-11-21 16:58:19, Mina Almasry wrote: [...] > > To be honest I think this is very workable, as is Shakeel's suggestion > > of MEMCG_OOM_NO_VICTIM. Since this is an opt-in feature, we can > > document the behavior and if the userspace doesn't want to get killed > > they can catch the sigbus and handle it gracefully. If not, the > > userspace just gets killed if we hit this edge case. > > I am not sure about the MEMCG_OOM_NO_VICTIM approach. It sounds really > hackish to me. I will get back to Shakeel's email as time permits. The > primary problem I have with this, though, is that the kernel oom killer > cannot really do anything sensible if the limit is reached and there > is nothing reclaimable left in this case. The tmpfs backed memory will > simply stay around and there are no means to recover without userspace > intervention. And just a small clarification. Tmpfs is fundamentally problematic from the OOM handling POV. The nuance here is that the OOM happens in a different memcg and thus a different resource domain. If you kill a task in the target memcg then you effectively DoS that workload. If you kill the allocating task then it is DoSed by anybody allowed to write to that shmem. All that without a graceful fallback. I still have very hard time seeing how that can work reasonably except for a very special case with a lot of other measures to ensure the target memcg never hits the hard limit so the OOM simply is not a problem. Memory controller has always been used to enforce and balance memory usage among resource domains and this goes against that principle. I would be really curious what Johannes thinks about this. -- Michal Hocko SUSE Labs