On Tue 19-07-16 17:50:29, Mikulas Patocka wrote: > > > On Mon, 18 Jul 2016, Michal Hocko wrote: > > > From: Michal Hocko <mhocko@xxxxxxxx> > > > > There has been a report about OOM killer invoked when swapping out to > > a dm-crypt device. The primary reason seems to be that the swapout > > out IO managed to completely deplete memory reserves. Mikulas was > > able to bisect and explained the issue by pointing to f9054c70d28b > > ("mm, mempool: only set __GFP_NOMEMALLOC if there are free elements"). > > > > The reason is that the swapout path is not throttled properly because > > the md-raid layer needs to allocate from the generic_make_request path > > which means it allocates from the PF_MEMALLOC context. dm layer uses > > mempool_alloc in order to guarantee a forward progress which used to > > inhibit access to memory reserves when using page allocator. This has > > changed by f9054c70d28b ("mm, mempool: only set __GFP_NOMEMALLOC if > > there are free elements") which has dropped the __GFP_NOMEMALLOC > > protection when the memory pool is depleted. > > > > If we are running out of memory and the only way forward to free memory > > is to perform swapout we just keep consuming memory reserves rather than > > throttling the mempool allocations and allowing the pending IO to > > complete up to a moment when the memory is depleted completely and there > > is no way forward but invoking the OOM killer. This is less than > > optimal. > > > > The original intention of f9054c70d28b was to help with the OOM > > situations where the oom victim depends on mempool allocation to make a > > forward progress. We can handle that case in a different way, though. We > > can check whether the current task has access to memory reserves ad an > > OOM victim (TIF_MEMDIE) and drop __GFP_NOMEMALLOC protection if the pool > > is empty. > > > > David Rientjes was objecting that such an approach wouldn't help if the > > oom victim was blocked on a lock held by process doing mempool_alloc. This > > is very similar to other oom deadlock situations and we have oom_reaper > > to deal with them so it is reasonable to rely on the same mechanism > > rather inventing a different one which has negative side effects. > > > > Fixes: f9054c70d28b ("mm, mempool: only set __GFP_NOMEMALLOC if there are free elements") > > Bisected-by: Mikulas Patocka <mpatocka@xxxxxxxxxx> > > Bisect was done by Ondrej Kozina. OK, fixed > > Signed-off-by: Michal Hocko <mhocko@xxxxxxxx> > > Reviewed-by: Mikulas Patocka <mpatocka@xxxxxxxxxx> > Tested-by: Mikulas Patocka <mpatocka@xxxxxxxxxx> Let's see whether we decide to go with this patch or a plain revert. In any case I will mark the patch for stable so it will end up in both 4.6 and 4.7 Anyway thanks for your and Ondrejs help here! -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>