Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Reclamation interactions with RCU

"NeilBrown" <neilb@xxxxxxx> · Wed, 13 Mar 2024 09:09:44 +1100

On Wed, 13 Mar 2024, Vlastimil Babka wrote:
> On 3/3/24 23:45, NeilBrown wrote:
> > On Sat, 02 Mar 2024, Kent Overstreet wrote:
> >> 
> >> *nod* 
> >> 
> >> > I suspect that most places where there is a non-error fallback already
> >> > use NORETRY or RETRY_MAYFAIL or similar.
> >> 
> >> NORETRY and RETRY_MAYFAIL actually weren't on my radar, and I don't see
> >> _tons_ of uses for either of them - more for NORETRY.
> >> 
> >> My go-to is NOWAIT in this scenario though; my common pattern is "try
> >> nonblocking with locks held, then drop locks and retry GFP_KERNEL".
> >>  
> >> > But I agree that changing the meaning of GFP_KERNEL has a potential to
> >> > cause problems.  I support promoting "GFP_NOFAIL" which should work at
> >> > least up to PAGE_ALLOC_COSTLY_ORDER (8 pages).
> >> 
> >> I'd support this change.
> >> 
> >> > I'm unsure how it should be have in PF_MEMALLOC_NOFS and
> >> > PF_MEMALLOC_NOIO context.  I suspect Dave would tell me it should work in
> >> > these contexts, in which case I'm sure it should.
> >> > 
> >> > Maybe we could then deprecate GFP_KERNEL.
> >> 
> >> What do you have in mind?
> > 
> > I have in mind a more explicit statement of how much waiting is
> > acceptable.
> > 
> > GFP_NOFAIL - wait indefinitely
> > GFP_KILLABLE - wait indefinitely unless fatal signal is pending.
> > GFP_RETRY - may retry but deadlock, though unlikely, is possible.  So
> >             don't wait indefinitely.  May abort more quickly if fatal
> >             signal is pending.
> > GFP_NO_RETRY - only try things once.  This may sleep, but will give up
> >             fairly quickly.  Either deadlock is a significant
> >             possibility, or alternate strategy is fairly cheap.
> > GFP_ATOMIC - don't sleep - same as current.
> > 
> > I don't see how "GFP_KERNEL" fits into that spectrum.  The definition of
> > "this will try really hard, but might fail and we can't really tell you
> > what circumstances it might fail in" isn't fun to work with.
> 
> The problem is if we set out to change everything from GFP_KERNEL to one of
> the above, it will take many years. So I think it would be better to just
> change the semantics of GFP_KERNEL too.

It took a long time to completely remove the BKL too.  I don't think
this is something we should be afraid of.  We can easily use tools to
remind us about the work that needs doing and the progress being made.

> 
> If we change it to remove the "too-small to fail" rule, we might suddenly
> introduce crashes in unknown amount of places, so I don't think that's feasible.

I don't think anyone wants that.

> 
> But if we change it to effectively mean GFP_NOFAIL (for non-costly
> allocations), there should be a manageable number of places to change to a
> variant that allows failure. Also if these places are GFP_KERNEL by mistake
> today, and should in fact allow failure, they would be already causing
> problems today, as the circumstances where too-small-to-fail is violated are
> quite rare (IIRC just being an oom victim, so somewhat close to
> GFP_KILLABLE). So changing GFP_KERNEL to GFP_NOFAIL should be the lowest
> risk (one could argue for GFP_KILLABLE but I'm afraid many places don't
> really handle that as they assume the too-small-to-fail without exceptions
> and are unaware of the oom victim loophole, and failing on any fatal signal
> increases the chances of this happening).

I think many uses of GFP_KERNEL should be changed to GFP_NOFAIL.
But that ISN'T just changing the flag (or the meaning of the flag).  It
also involves removing that code that handles failure.  That is, to me,
a strong argument against redefining GFP_KERNEL to mean GFP_NOFAIL.  We (I)
really do want that error handling code to be removed.  That needs to be
done on a case-by-case basis.

Thanks,
NeilBrown