On Thu, Sep 16, 2021 at 08:35:40AM +1000, NeilBrown wrote: > On Wed, 15 Sep 2021, Michal Hocko wrote: > > On Wed 15-09-21 07:48:11, Neil Brown wrote: > > > > > > Why does __GFP_NOFAIL access the reserves? Why not require that the > > > relevant "Try harder" flag (__GFP_ATOMIC or __GFP_MEMALLOC) be included > > > with __GFP_NOFAIL if that is justified? > > > > Does 5020e285856c ("mm, oom: give __GFP_NOFAIL allocations access to > > memory reserves") help? > > Yes, that helps. A bit. > > I'm not fond of the clause "the allocation request might have come with some > locks held". What if it doesn't? Does it still have to pay the price. > > Should we not require that the caller indicate if any locks are held? > That way callers which don't hold locks can use __GFP_NOFAIL without > worrying about imposing on other code. > > Or is it so rare that __GFP_NOFAIL would be used without holding a lock > that it doesn't matter? > > The other commit of interest is > > Commit: 6c18ba7a1899 ("mm: help __GFP_NOFAIL allocations which do not trigger OOM killer") > > I don't find the reasoning convincing. It is a bit like "Robbing Peter > to pay Paul". It takes from the reserves to allow a __GFP_NOFAIL to > proceed, with out any reason to think this particular allocation has any > more 'right' to the reserves than anything else. > > While I don't like the reasoning in either of these, they do make it > clear (to me) that the use of reserves is entirely an internal policy > decision. They should *not* be seen as part of the API and callers > should not have to be concerned about it when deciding whether to use > __GFP_NOFAIL or not. Agree totally with this - we just want to block until allocation succeeds, and if the -filesystem- deadlocks because allocation never succeeds then that's a problem that needs to be solved in the filesystem with a different memory allocation strategy... OTOH, setting up a single __GFP_NOFAIL call site with the ability to take the entire system down seems somewhat misguided. > The use of these reserves is, at most, a hypothetical problem. If it > ever looks like becoming a real practical problem, it needs to be fixed > internally to the page allocator. Maybe an extra water-mark which isn't > quite as permissive as ALLOC_HIGH... > > I'm inclined to drop all references to reserves from the documentation > for __GFP_NOFAIL. I think there are enough users already that adding a > couple more isn't going to make problems substantially more likely. And > more will be added anyway that the mm/ team won't have the opportunity > or bandwidth to review. Yup, we've been replacing open coded loops like in kmem_alloc() with explicit __GFP_NOFAIL usage for a while now: $ ▶ git grep __GFP_NOFAIL fs/xfs |wc -l 33 $ ANd we've got another 100 or so call sites planned for conversion to __GFP_NOFAIL. Hence the suggestion to remove the use of reserves from __GFP_NOFAIL seems like a sensible plan because it has never been necessary in the past for all the allocation sites we are converting from open coded loops to __GFP_NOFAIL... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx