Re: [PATCH 6.15] mm/vma: add give_up_on_oom option on modify/merge, use in uffd release

"Liam R. Howlett" <Liam.Howlett@xxxxxxxxxx> · Fri, 21 Mar 2025 11:27:34 -0400

+cc Peter due to uffd interests

* Pedro Falcato <pfalcato@xxxxxxx> [250321 07:26]:
> On Fri, Mar 21, 2025 at 10:09:37AM +0000, Lorenzo Stoakes wrote:
> > Currently, if a VMA merge fails due to an OOM condition arising on commit
> > merge or a failure to duplicate anon_vma's, we report this so the caller
> > can handle it.
> > 
> > However there are cases where the caller is only ostensibly trying a
> > merge, and doesn't mind if it fails due to this condition.
> >
> 
> Ok, so here's my problem with your idea: I don't think merge should be exposed
> to vma_modify() callers. Right now (at least AIUI), you want to modify a given
> VMA, you call vma_modify(), and it gives you a vma you can straight up modify
> without any problems. Essentially breaks down any VMAs necessary. This feels
> contractually simple and easy to use, and I don't think leaking details about
> merging is the correct approach here.
> 
> > Since we do not want to introduce an implicit assumption that we only
> > actually modify VMAs after OOM conditions might arise, add a 'give up on
> > oom' option and make an explicit contract that, should this flag be set, we
> > absolutely will not modify any VMAs should OOM arise and just bail out.
> >
> 
> Thus, to me the most natural solution is still mine. Do you think it places too
> many constraints on vma_modify()? vma_modify() on a single VMA, without
> splitting, Just Working(tm) is a sensible expectation (and vma_merge being fully
> best-effort). Things like mprotect() failing due to OOM are also pretty disastrous,
> so if we could limit that it'd be great.
> 
> In any case, your solution looks palatable to me, but I want to make
> sure we're not making this excessively complicated.

Either solution is fine with me, but...

I hate both of them, and I (mostly) blame uffd.  Some blame is on syzbot
for exposing me to this unreachable failure. ;-)

I think Lorenzo's patch points out that we should have a better way to
deal with a return and a vma pointer and we basically have that a lot of
other places with the structures that we thread through to deal with
mostly unreachable errors that syzbot injects.  I dislike flags passed
in to treat things differently.  We are adding an 8th argument to the
function (of type boolean) to work around this..

I think Pedro's patch is working around the same issue of return value
vs return pointers.  Other places we pass in &prev and just do what we
need to in the caller and just check the return value - which I also
hate, especially when you look at the assembly generated to deal with a
"non-problem" problem.

I have no better solution and need to spend more time looking into it,
but the number of times we have syzbot failing a random allocation and
finding issues.. well, it's basically a class of bugs we handle very
poorly throughout our stack.

Realistically, I *hope* that Lorenzo's additional argument will make
code authors think twice about the failure scenario when calling the
function.

Pedro's code fixes this caller, but leaves the possibility of someone
missing this again in the future, I think?  This could be made more
obvious with some clever naming of functions, perhaps try_<something>?

We are essentially avoiding the compiler from catching the error for us
by returning that ERR_PTR(), which (keeping with the theme of my email)
I hate.  It's fine for little functions but we've made a mess of it too
often.

Reality will probably not align with the realistic view and people will
just copy/paste from wherever they saw it called... so we should think
twice about the failure scenarios on code review and I think a flag
(or a function name change?) might make this more obvious.

Thanks,
Liam