On Thu, Feb 08, 2024 at 08:55:05PM +0100, Vlastimil Babka (SUSE) wrote: > On 2/8/24 18:33, Michal Hocko wrote: > > On Thu 08-02-24 17:02:07, Vlastimil Babka (SUSE) wrote: > >> On 1/9/24 05:47, Dave Chinner wrote: > >> > On Thu, Jan 04, 2024 at 09:17:16PM +0000, Matthew Wilcox wrote: > >> > >> Your points and Kent's proposal of scoped GFP_NOWAIT [1] suggests to me this > >> is no longer FS-only topic as this isn't just about converting to the scoped > >> apis, but also how they should be improved. > > > > Scoped GFP_NOFAIL context is slightly easier from the semantic POV than > > scoped GFP_NOWAIT as it doesn't add a potentially unexpected failure > > mode. It is still tricky to deal with GFP_NOWAIT requests inside the > > NOFAIL scope because that makes it a non failing busy wait for an > > allocation if we need to insist on scope NOFAIL semantic. > > > > On the other hand we can define the behavior similar to what you > > propose with RETRY_MAYFAIL resp. NORETRY. Existing NOWAIT users should > > better handle allocation failures regardless of the external allocation > > scope. > > > > Overriding that scoped NOFAIL semantic with RETRY_MAYFAIL or NORETRY > > resembles the existing PF_MEMALLOC and GFP_NOMEMALLOC semantic and I do > > not see an immediate problem with that. > > > > Having more NOFAIL allocations is not great but if you need to > > emulate those by implementing the nofail semantic outside of the > > allocator then it is better to have those retries inside the allocator > > IMO. > > I see potential issues in scoping both the NOWAIT and NOFAIL > > - NOFAIL - I'm assuming Dave is adding __GFP_NOFAIL to xfs allocations or > adjacent layers where he knows they must not fail for his transaction. But > could the scope affect also something else underneath that could fail > without the failure propagating in a way that it affects xfs? Memory allocaiton failures below the filesystem (i.e. in the IO path) will fail the IO, and if that happens for a read IO within a transaction then it will have the same effect as XFS failing a memory allocation. i.e. it will shut down the filesystem. The key point here is the moment we go below the filesystem we enter into a new scoped allocation context with a guaranteed method of returning errors: NOIO and bio errors. Once we cross an allocation scope boundary, NOFAIL is no longer relevant to the code that is being run because there are other errors that can occur that the filesysetm must handle that. Hence memory allocation errors just don't matter at this point, and the NOFAIL constraint is no longer relevant. Hence we really need to conside NOFAIL differently to NOFS/NOIO. NOFS/NOIO are about avoiding reclaim recursion deadlocks, so are relevant all the way down the stack. NOFAIL is only relevant to a specific subsystem to prevent subsystem allocations from failing, but as soon as we cross into another subsystem that can (and does) return errors for memory allocation failures, the NOFAIL context is no longer relevant. i.e NOFAIL scopes are not relevant outside the subsystem that sets it. Hence we likely need helpers to clear and restore NOFAIL when we cross an allocation context boundaries. e.g. as we cross from filesystem to block layer in the IO stack via submit_bio(). Maybe they should be doing something like: nofail_flags = memalloc_nofail_clear(); noio_flags = memalloc_noio_save(); .... memalloc_noio_restore(noio_flags); memalloc_nofail_reinstate(nofail_flags); > Maybe it's a > high-order allocation with a low-order fallback that really should not be > __GFP_NOFAIL? We would need to hope it has something like RETRY_MAYFAIL or > NORETRY already. But maybe it just relies on >costly order being more likely > to fail implicitly, and those costly orders should be kept excluded from the > scoped NOFAIL? Maybe __GFP_NOWARN should also override the scoped nofail? We definitely need NORETRY/RETRY_MAYFAIL to override scoped NOFAIL at the filesystem layer (e.g. for readahead buffer allocations, xlog_kvmalloc(), etc to correctly fail fast within XFS transactions), but I don't think we should force every subsystem to have to do this just in case a higher level subsystem had a scoped NOFAIL set for it to work correctly. > - NOWAIT - as said already, we need to make sure we're not turning an > allocation that relied on too-small-to-fail into a null pointer exception or > BUG_ON(!page). Agreed. NOWAIT is removing allocation failure constraints and I don't think that can be made to work reliably. Error injection cannot prove the absence of errors and so we can never be certain the code will always operate correctly and not crash when an unexepected allocation failure occurs. -Dave. -- Dave Chinner david@xxxxxxxxxxxxx