On Thu, Jul 21, 2016 at 04:53:10PM +0200, Michal Hocko wrote: > From: Michal Hocko <mhocko@xxxxxxxx> > Date: Thu, 21 Jul 2016 16:40:59 +0200 > Subject: [PATCH] Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are > free elements" > > This reverts commit f9054c70d28bc214b2857cf8db8269f4f45a5e23. > > There has been a report about OOM killer invoked when swapping out to > a dm-crypt device. The primary reason seems to be that the swapout > out IO managed to completely deplete memory reserves. Ondrej was -out > able to bisect and explained the issue by pointing to f9054c70d28b > ("mm, mempool: only set __GFP_NOMEMALLOC if there are free elements"). > > The reason is that the swapout path is not throttled properly because > the md-raid layer needs to allocate from the generic_make_request path > which means it allocates from the PF_MEMALLOC context. dm layer uses > mempool_alloc in order to guarantee a forward progress which used to > inhibit access to memory reserves when using page allocator. This has > changed by f9054c70d28b ("mm, mempool: only set __GFP_NOMEMALLOC if > there are free elements") which has dropped the __GFP_NOMEMALLOC > protection when the memory pool is depleted. > > If we are running out of memory and the only way forward to free memory > is to perform swapout we just keep consuming memory reserves rather than > throttling the mempool allocations and allowing the pending IO to > complete up to a moment when the memory is depleted completely and there > is no way forward but invoking the OOM killer. This is less than > optimal. > > The original intention of f9054c70d28b was to help with the OOM > situations where the oom victim depends on mempool allocation to make a > forward progress. David has mentioned the following backtrace: > > schedule > schedule_timeout > io_schedule_timeout > mempool_alloc > __split_and_process_bio > dm_request > generic_make_request > submit_bio > mpage_readpages > ext4_readpages > __do_page_cache_readahead > ra_submit > filemap_fault > handle_mm_fault > __do_page_fault > do_page_fault > page_fault > > We do not know more about why the mempool is depleted without being > replenished in time, though. In any case the dm layer shouldn't depend > on any allocations outside of the dedicated pools so a forward progress > should be guaranteed. If this is not the case then the dm should be > fixed rather than papering over the problem and postponing it to later > by accessing more memory reserves. > > mempools are a mechanism to maintain dedicated memory reserves to guaratee > forward progress. Allowing them an unbounded access to the page allocator > memory reserves is going against the whole purpose of this mechanism. > > Bisected-by: Ondrej Kozina <okozina@xxxxxxxxxx> > Signed-off-by: Michal Hocko <mhocko@xxxxxxxx> Reviewed-by: Johannes Weiner <hannes@xxxxxxxxxxx> Thanks Michal -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>