On Fri, 22 Jan 2016, Tetsuo Handa wrote: > > > (1) Design and use a system with appropriate memory capacity in mind. > > > > > > (2) When (1) failed, the OOM killer is invoked. The OOM killer selects > > > an OOM victim and allow that victim access to memory reserves by > > > setting TIF_MEMDIE to it. > > > > > > (3) When (2) did not solve the OOM condition, start allowing all tasks > > > access to memory reserves by your approach. > > > > > > (4) When (3) did not solve the OOM condition, start selecting more OOM > > > victims by my approach. > > > > > > (5) When (4) did not solve the OOM condition, trigger the kernel panic. > > > > > > > This was all mentioned previously, and I suggested that the panic only > > occur when memory reserves have been depleted, otherwise there is still > > the potential for the livelock to be solved. That is a patch that would > > apply today, before any of this work, since we never want to loop > > endlessly in the page allocator when memory reserves are fully depleted. > > > > This is all really quite simple. > > > > So, David is OK with above approach, right? > Then, Michal and Johannes, are you OK with above approach? > The first step before implementing access to memory reserves on livelock (my patch) and oom killing additional processes on livelock (your patch) is to detect the appropriate place to panic() when reserves are depleted. This has historically been done in the oom killer when there are no oom killable processes left. That's easy to figure out and should still be done, but we are now introducing the possibility of memory reserves being fully depleted while there are oom killable processes left or victims that cannot exit. So we need a patch to the page allocator that would be applicable today before any of the above is worked on to detect when reserves are depleted and panic() rather than loop forever in the page allocator. I'd suggest that this work be done as a follow-up to Michal's patchset to rework the page allocator retry logic. It's not entirely trivial because we want to detect situations when high-order < PAGE_ALLOC_COSTLY_ORDER allocations are looping forever and we are failing due to fragmentation as well. If all cpus are looping trying to allocate a task_struct, and there are eligible zones with some free memory but it is not allocatable, we still want to panic(). > What I'm not sure about above approach are handling of !__GFP_NOFAIL && > !__GFP_FS allocation requests and use of ALLOC_NO_WATERMARKS without > TIF_MEMDIE. > > Basically, we want to make small allocation requests success unless > __GFP_NORETRY is given. Currently such allocation requests do not fail > unless TIF_MEMDIE is given by the OOM killer. But how hard do we want to > continue looping when we reach (3) by timeout for waiting for TIF_MEMDIE > task at (2) expires? > In my patch, that is tunable by the user with a new sysctl and defines when the oom killer is considered livelocked because the victim cannot exit. I think we'd do *did_some_progress = 1 for !__GFP_FS as is done today before this expiration happens and otherwise trigger the oom killer livelock detection in my patch to allow the allocation to succeed with ALLOC_NO_WATERMARKS. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>