Michal Hocko wrote: > That being said, I will keep refusing other such tweaks unless you have > a sound usecase behind. If you really _want_ to help out here then you > can focus on the reaping of the mlock memory. Not the reaping of the mlock'ed memory. Although Manish's report was mlock'ed case, there are other cases (e.g. MAP_SHARED, mmu_notifier, mmap_sem held for write) which can lead to this race condition. If we think about artificial case, it would be possible to run 1024 threads not sharing signal_struct but consume almost 0KB memory (i.e. written without using C library) and many of them are running between __gfp_pfmemalloc_flags() and mutex_trylock() waiting for ALLOC_OOM. What the Manish's report revealed is the fact that we accidentally broke the /* * Kill all user processes sharing victim->mm in other thread groups, if * any. They don't get access to memory reserves, though, to avoid * depletion of all memory. This prevents mm->mmap_sem livelock when an * oom killed thread cannot exit because it requires the semaphore and * its contended by another thread trying to allocate memory itself. * That thread will now get access to memory reserves since it has a * pending fatal signal. */ assumption via 696453e66630ad45 ("mm, oom: task_will_free_mem should skip oom_reaped tasks"), and we already wasted for 16 months. There is no need to wait for fixing mlock'ed, MAP_SHARED, mmu_notifier and mmap_sem cases because "OOM victims consuming almost 0KB memory" case cannot be solved. The mlock'ed, MAP_SHARED, mmu_notifier and mmap_sem cases are a sort of alias of "OOM victims consuming almost 0KB memory" case. Anyway, since you introduced MMF_OOM_VICTIM flag, I will try a patch which checks MMF_OOM_VICTIM instead of oom_reserves_allowed(). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>