On Sat, Mar 02, 2024 at 10:47:59AM +1100, NeilBrown wrote: > On Sat, 02 Mar 2024, Kent Overstreet wrote: > > On Fri, Mar 01, 2024 at 04:54:55PM +1100, Dave Chinner wrote: > > > On Fri, Mar 01, 2024 at 01:16:18PM +1100, NeilBrown wrote: > > > > While we are considering revising mm rules, I would really like to > > > > revised the rule that GFP_KERNEL allocations are allowed to fail. > > > > I'm not at all sure that they ever do (except for large allocations - so > > > > maybe we could leave that exception in - or warn if large allocations > > > > are tried without a MAY_FAIL flag). > > > > > > > > Given that GFP_KERNEL can wait, and that the mm can kill off processes > > > > and clear cache to free memory, there should be no case where failure is > > > > needed or when simply waiting will eventually result in success. And if > > > > there is, the machine is a gonner anyway. > > > > > > Yes, please! > > > > > > XFS was designed and implemented on an OS that gave this exact > > > guarantee for kernel allocations back in the early 1990s. Memory > > > allocation simply blocked until it succeeded unless the caller > > > indicated they could handle failure. That's what __GFP_NOFAIL does > > > and XFS is still heavily dependent on this behaviour. > > > > I'm not saying we should get rid of __GFP_NOFAIL - actually, I'd say > > let's remove the underscores and get rid of the silly two page limit. > > GFP_NOFAIL|GFP_KERNEL is perfectly safe for larger allocations, as long > > as you don't mind possibly waiting a bit. > > > > But it can't be the default because, like I mentioned to Neal, there are > > a _lot_ of different places where we allocate memory in the kernel, and > > they have to be able to fail instead of shoving everything else out of > > memory. > > > > > This is the sort of thing I was thinking of in the "remove > > > GFP_NOFS" discussion thread when I said this to Kent: > > > > > > "We need to start designing our code in a way that doesn't require > > > extensive testing to validate it as correct. If the only way to > > > validate new code is correct is via stochastic coverage via error > > > injection, then that is a clear sign we've made poor design choices > > > along the way." > > > > > > https://lore.kernel.org/linux-fsdevel/ZcqWh3OyMGjEsdPz@xxxxxxxxxxxxxxxxxxx/ > > > > > > If memory allocation doesn't fail by default, then we can remove the > > > vast majority of allocation error handling from the kernel. Make the > > > common case just work - remove the need for all that code to handle > > > failures that is hard to exercise reliably and so are rarely tested. > > > > > > A simple change to make long standing behaviour an actual policy we > > > can rely on means we can remove both code and test matrix overhead - > > > it's a win-win IMO. > > > > We definitely don't want to make GFP_NOIO/GFP_NOFS allocations nofail by > > default - a great many of those allocations have mempools in front of > > them to avoid deadlocks, and if you do that you've made the mempools > > useless. > > > > Not strictly true. mempool_alloc() adds __GFP_NORETRY so the allocation > will certainly fail if that is appropriate. *nod* > I suspect that most places where there is a non-error fallback already > use NORETRY or RETRY_MAYFAIL or similar. NORETRY and RETRY_MAYFAIL actually weren't on my radar, and I don't see _tons_ of uses for either of them - more for NORETRY. My go-to is NOWAIT in this scenario though; my common pattern is "try nonblocking with locks held, then drop locks and retry GFP_KERNEL". > But I agree that changing the meaning of GFP_KERNEL has a potential to > cause problems. I support promoting "GFP_NOFAIL" which should work at > least up to PAGE_ALLOC_COSTLY_ORDER (8 pages). I'd support this change. > I'm unsure how it should be have in PF_MEMALLOC_NOFS and > PF_MEMALLOC_NOIO context. I suspect Dave would tell me it should work in > these contexts, in which case I'm sure it should. > > Maybe we could then deprecate GFP_KERNEL. What do you have in mind? Deprecating GFP_NOFS and GFP_NOIO would be wonderful - those should really just be PF_MEMALLOC_NOFS and PF_MEMALLOC_NOIO, now that we're pushing for memalloc_flags_(save|restore) more. Getting rid of those would be a really nice cleanup beacuse then gfp flags would mostly just be: - the type of memory to allocate (highmem, zeroed, etc.) - how hard to try (don't block at all, block some, block forever)