Michal Hocko wrote: > On Thu 24-09-15 14:15:34, David Rientjes wrote: > > > > Finally. Whatever we do, we need to change oom_kill_process() first, > > > > and I think we should do this regardless. The "Kill all user processes > > > > sharing victim->mm" logic looks wrong and suboptimal/overcomplicated. > > > > I'll try to make some patches tomorrow if I have time... > > > > > > That would be appreciated. I do not like that part either. At least we > > > shouldn't go over the whole list when we have a good chance that the mm > > > is not shared with other processes. > > > > > > > Heh, it's actually imperative to avoid livelocking based on mm->mmap_sem, > > it's the reason the code exists. Any optimizations to that is certainly > > welcome, but we definitely need to send SIGKILL to all threads sharing the > > mm to make forward progress, otherwise we are going back to pre-2008 > > livelocks. > > Yes but mm is not shared between processes most of the time. CLONE_VM > without CLONE_THREAD is more a corner case yet we have to crawl all the > task_structs for _each_ OOM killer invocation. Yes this is an extreme > slow path but still might take quite some unnecessarily time. Excuse me, but thinking about CLONE_VM without CLONE_THREAD case... Isn't there possibility of hitting livelocks at /* * If current has a pending SIGKILL or is exiting, then automatically * select it. The goal is to allow it to allocate so that it may * quickly exit and free its memory. * * But don't select if current has already released its mm and cleared * TIF_MEMDIE flag at exit_mm(), otherwise an OOM livelock may occur. */ if (current->mm && (fatal_signal_pending(current) || task_will_free_mem(current))) { mark_oom_victim(current); return true; } if current thread receives SIGKILL just before reaching here, for we don't send SIGKILL to all threads sharing the mm? Hopefully current thread is not holding inode->i_mutex because reaching here (i.e. calling out_of_memory()) suggests that we are doing GFP_KERNEL allocation. But it could be !__GFP_NOFS && __GFP_NOFAIL allocation, or different locks contended by another thread sharing the mm? I don't like "That thread will now get access to memory reserves since it has a pending fatal signal." line in comments for the "Kill all user processes sharing victim->mm" logic. That thread won't get access to memory reserves unless that thread can call out_of_memory() (i.e. doing __GFP_FS or __GFP_NOFAIL allocations). Since I can observe that that thread may be doing !__GFP_NOFS allocation, I think that this comment needs to be updated. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>