On Tue, Oct 05, 2021 at 02:27:45PM +0200, Vlastimil Babka wrote: > On 10/5/21 13:09, Michal Hocko wrote: > > On Tue 05-10-21 11:20:51, Vlastimil Babka wrote: > > [...] > >> > --- a/include/linux/gfp.h > >> > +++ b/include/linux/gfp.h > >> > @@ -209,7 +209,11 @@ struct vm_area_struct; > >> > * used only when there is no reasonable failure policy) but it is > >> > * definitely preferable to use the flag rather than opencode endless > >> > * loop around allocator. > >> > - * Using this flag for costly allocations is _highly_ discouraged. > >> > + * Use of this flag may lead to deadlocks if locks are held which would > >> > + * be needed for memory reclaim, write-back, or the timely exit of a > >> > + * process killed by the OOM-killer. Dropping any locks not absolutely > >> > + * needed is advisable before requesting a %__GFP_NOFAIL allocate. > >> > + * Using this flag for costly allocations (order>1) is _highly_ discouraged. > >> > >> We define costly as 3, not 1. But sure it's best to avoid even order>0 for > >> __GFP_NOFAIL. Advising order>1 seems arbitrary though? > > > > This is not completely arbitrary. We have a warning for any higher order > > allocation. > > rmqueue: > > WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); > > Oh, I missed that. > > > I do agree that "Using this flag for higher order allocations is > > _highly_ discouraged. > > Well, with the warning in place this is effectively forbidden, not just > discouraged. Yup, especially as it doesn't obey __GFP_NOWARN. See commit de2860f46362 ("mm: Add kvrealloc()") as a direct result of unwittingly tripping over this warning when adding __GFP_NOFAIL annotations to replace open coded high-order kmalloc loops that have been in place for a couple of decades without issues. Personally I think that the way __GFP_NOFAIL is first of all recommended over open coded loops and then only later found to be effectively forbidden and needing to be replaced with open coded loops to be a complete mess. Not to mention on the impossibility of using __GFP_NOFAIL with kvmalloc() calls. Just what do we expect kmalloc_node(__GFP_NORETRY | __GFP_NOFAIL) to do, exactly? So, effectively, we have to open-code around kvmalloc() in situations where failure is not an option. Even if we pass __GFP_NOFAIL to __vmalloc(), it isn't guaranteed to succeed because of the "we won't honor gfp flags passed to __vmalloc" semantics it has. Even the API constaints of kvmalloc() w.r.t. only doing the vmalloc fallback if the gfp context is GFP_KERNEL - we already do GFP_NOFS kvmalloc via memalloc_nofs_save/restore(), so this behavioural restriction w.r.t. gfp flags just makes no sense at all. That leads to us having to go back to writing extremely custom open coded loops to avoid awful high-order kmalloc direct reclaim behaviour and still fall back to vmalloc and to still handle NOFAIL semantics we need: https://lore.kernel.org/linux-xfs/20210902095927.911100-8-david@xxxxxxxxxxxxx/ So, really, the problems are much deeper here than just badly documented, catch-22 rules for __GFP_NOFAIL - we can't even use __GFP_NOFAIL consistently across the allocation APIs because it changes allocation behaviours in unusable, self-defeating ways.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx