+cc Peter due to uffd interests * Pedro Falcato <pfalcato@xxxxxxx> [250321 07:26]: > On Fri, Mar 21, 2025 at 10:09:37AM +0000, Lorenzo Stoakes wrote: > > Currently, if a VMA merge fails due to an OOM condition arising on commit > > merge or a failure to duplicate anon_vma's, we report this so the caller > > can handle it. > > > > However there are cases where the caller is only ostensibly trying a > > merge, and doesn't mind if it fails due to this condition. > > > > Ok, so here's my problem with your idea: I don't think merge should be exposed > to vma_modify() callers. Right now (at least AIUI), you want to modify a given > VMA, you call vma_modify(), and it gives you a vma you can straight up modify > without any problems. Essentially breaks down any VMAs necessary. This feels > contractually simple and easy to use, and I don't think leaking details about > merging is the correct approach here. > > > Since we do not want to introduce an implicit assumption that we only > > actually modify VMAs after OOM conditions might arise, add a 'give up on > > oom' option and make an explicit contract that, should this flag be set, we > > absolutely will not modify any VMAs should OOM arise and just bail out. > > > > Thus, to me the most natural solution is still mine. Do you think it places too > many constraints on vma_modify()? vma_modify() on a single VMA, without > splitting, Just Working(tm) is a sensible expectation (and vma_merge being fully > best-effort). Things like mprotect() failing due to OOM are also pretty disastrous, > so if we could limit that it'd be great. > > In any case, your solution looks palatable to me, but I want to make > sure we're not making this excessively complicated. Either solution is fine with me, but... I hate both of them, and I (mostly) blame uffd. Some blame is on syzbot for exposing me to this unreachable failure. ;-) I think Lorenzo's patch points out that we should have a better way to deal with a return and a vma pointer and we basically have that a lot of other places with the structures that we thread through to deal with mostly unreachable errors that syzbot injects. I dislike flags passed in to treat things differently. We are adding an 8th argument to the function (of type boolean) to work around this.. I think Pedro's patch is working around the same issue of return value vs return pointers. Other places we pass in &prev and just do what we need to in the caller and just check the return value - which I also hate, especially when you look at the assembly generated to deal with a "non-problem" problem. I have no better solution and need to spend more time looking into it, but the number of times we have syzbot failing a random allocation and finding issues.. well, it's basically a class of bugs we handle very poorly throughout our stack. Realistically, I *hope* that Lorenzo's additional argument will make code authors think twice about the failure scenario when calling the function. Pedro's code fixes this caller, but leaves the possibility of someone missing this again in the future, I think? This could be made more obvious with some clever naming of functions, perhaps try_<something>? We are essentially avoiding the compiler from catching the error for us by returning that ERR_PTR(), which (keeping with the theme of my email) I hate. It's fine for little functions but we've made a mess of it too often. Reality will probably not align with the realistic view and people will just copy/paste from wherever they saw it called... so we should think twice about the failure scenarios on code review and I think a flag (or a function name change?) might make this more obvious. Thanks, Liam