On Wed 28-08-24 18:58:43, Kent Overstreet wrote: > On Wed, Aug 28, 2024 at 09:26:44PM GMT, Michal Hocko wrote: > > On Wed 28-08-24 15:11:19, Kent Overstreet wrote: [...] > > > It was decided _years_ ago that PF_MEMALLOC flags were how this was > > > going to be addressed. > > > > Nope! It has been decided that _some_ gfp flags are acceptable to be used > > by scoped APIs. Most notably NOFS and NOIO are compatible with reclaim > > modifiers and other flags so these are indeed safe to be used that way. > > Decided by who? Decides semantic of respective GFP flags and their compatibility with others that could be nested in the scope. Zone modifiers __GFP_DMA, __GFP_HIGHMEM, __GFP_DMA32 and __GFP_MOVABLE would allow only __GFP_DMA to have scoped semantic because it is the most restrictive of all of them (i.e. __GFP_DMA32 can be served from __GFP_DMA but not other way around) but nobody really requested that. __GFP_RECLAIMABLE is slab allocator specific and nested allocations cannot be assumed they have shrinkers so this cannot really have scoped semantic. __GFP_WRITE only implies node spreading. Likely OK for scope interface, nobody requested that. __GFP_HARDWALL only to be used for user space allocations. Wouldn't break anything if it had scoped interface but nobody requested that. __GFP_THISNODE only to be used by allocators internally to define NUMA placement strategy. Not safe for scoped interface as it changes the failure semantic __GFP_ACCOUNT defines memcg accounting. Generally usable from user context and safe for scope interface in that context as it doesn't change the failure nor reclaim semantic __GFP_NO_OBJ_EXT internal flag not to be used outside of mm. __GFP_HIGH gives access to memory reserves. It could be used for scope interface but nobody requested that. __GFP_MEMALLOC - already has a scope interface PF_MEMALLOC. This is not really great though because it grants unbounded access to memory reserves and that means that it isreally tricky to see how many allocations really can use reserves. It has been added because swap over NFS had to guarantee forward progress and networking layer was not prepared for that. Fundamentally this doesn't change the allocation nor reclaim semantic so it is safe for a scope API. __GFP_NOMEMALLOC used to override PF_MEMALLOC so a scoped interface doesn't make much sense __GFP_IO already has scope interface to drop this flag. It is safe because it doesn't change failure semantic and it makes the reclaim context more constrained so it is compatible with other reclaim modifiers. Contrary it would be unsafe to have a scope interface to add this flag because all GFP_NOIO nested allocations could deadlock __GFP_FS. Similar to __GFP_IO. __GFP_DIRECT_RECLAIM allows allocation to sleep. Scoped interface to set the flag is unsafe for any nested GFP_NOWAIT/GFP_ATOMIC requests which might be called from withing atomic contexts. Scope interface to clear the flag is unsafe for scoped interface because __GFP_NOFAIL allocation mode doesn't support requests without this flag so any nested NOFAIL allocation would break and see unexpected and potentially unhandled failure mode. __GFP_KSWAPD_RECLAIM controls whether kswapd is woken up. Doesn't change the failure nor direct reclaim behavior. Scoped interface to set the flag seems rather pointless and one to clear the bit dangerous because it could put MM into unbalanced state as kswapd wouldn't wake up. __GFP_RETRY_MAYFAIL - changes the failure mode so it is fundamentally incompatible with nested __GFP_NOFAIL allocations. Scoped interface to clear the flag would be safe but probably pointless. __GFP_NORETRY - same as above __GFP_NOFAIL - incompatible with any nested GFP_NOWAIT/GFP_ATOMIC allocations. One could argue that those are fine to see allocation failure so this will not create any unexpected failure mode which is a fair argument but what would be the actual usecase for setting all nested allocations to NOFAIL mode when they likely have a failure mode? Interface to clear the flag for the scope would be unsafe because all nested NOFAIL allocations would get an unexpected failure mode. __GFP_NOWARN safe to have scope interface both to set and clear the flag. __GFP_COMP only to be used for high order allocations and changes the tail pages tracking which would break any nested high order request without the flag. So unsafe for the scope interface both to set and clear the flag. __GFP_ZERO changes the initialization and safe for scope interface. We even have a global switch to do that for all allocations init_on_alloc __GFP_NOLOCKDEP disables lockdep reclaim recursion detection. Safe for scope interface AFAICS. -- Michal Hocko SUSE Labs