Re: [PATCH v2 2/6] mm/munmap: Replace can_modify_mm with can_modify_vma

"Liam R. Howlett" <Liam.Howlett@xxxxxxxxxx> · Mon, 12 Aug 2024 15:25:26 -0400

* Jeff Xu <jeffxu@xxxxxxxxxx> [240812 13:38]:
> On Mon, Aug 12, 2024 at 9:58 AM Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> wrote:
> >
> > * Jeff Xu <jeffxu@xxxxxxxxxx> [240812 10:30]:
> > > + Kees who commented on mseal() series before. Please keep Kees in the
> > > cc for future response to this series.
> > >
> > > On Fri, Aug 9, 2024 at 12:25 PM Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> wrote:
> > > >
> > > > * Pedro Falcato <pedro.falcato@xxxxxxxxx> [240809 14:53]:
> > > > > On Fri, Aug 9, 2024 at 5:48 PM Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > * Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> [240809 12:15]:
> > > > > > > * Pedro Falcato <pedro.falcato@xxxxxxxxx> [240807 17:13]:
> > > > > > > > We were doing an extra mmap tree traversal just to check if the entire
> > > > > > > > range is modifiable. This can be done when we iterate through the VMAs
> > > > > > > > instead.
> > > > > > > >
> > > > > > > > Signed-off-by: Pedro Falcato <pedro.falcato@xxxxxxxxx>
> > > > > > > > ---
> > > > > > > >  mm/mmap.c | 13 +------------
> > > > > > > >  mm/vma.c  | 23 ++++++++++++-----------
> > > > > > > >  2 files changed, 13 insertions(+), 23 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > > > > > > index 4a9c2329b09..c1c7a7d00f5 100644
> > > > > > > > --- a/mm/mmap.c
> > > > > > > > +++ b/mm/mmap.c
> > > > > > > > @@ -1740,18 +1740,7 @@ int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > > > > > > >             unsigned long start, unsigned long end, struct list_head *uf,
> > > > > > > >             bool unlock)
> > > > > > > >  {
> > > > > > > > -   struct mm_struct *mm = vma->vm_mm;
> > > > > > > > -
> > > > > > > > -   /*
> > > > > > > > -    * Check if memory is sealed before arch_unmap.
> > > > > > > > -    * Prevent unmapping a sealed VMA.
> > > > > > > > -    * can_modify_mm assumes we have acquired the lock on MM.
> > > > > > > > -    */
> > > > > > > > -   if (unlikely(!can_modify_mm(mm, start, end)))
> > > > > > > > -           return -EPERM;
> > > > > > > > -
> > > > > > > > -   arch_unmap(mm, start, end);
> > > > > > > > -   return do_vmi_align_munmap(vmi, vma, mm, start, end, uf, unlock);
> > > > > > > > +   return do_vmi_align_munmap(vmi, vma, vma->vm_mm, start, end, uf, unlock);
> > > > > > > >  }
> > > > > > > >
> > > > > > > >  /*
> > > > > > > > diff --git a/mm/vma.c b/mm/vma.c
> > > > > > > > index bf0546fe6ea..7a121bcc907 100644
> > > > > > > > --- a/mm/vma.c
> > > > > > > > +++ b/mm/vma.c
> > > > > > > > @@ -712,6 +712,12 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > > > > > > >             if (end < vma->vm_end && mm->map_count >= sysctl_max_map_count)
> > > > > > > >                     goto map_count_exceeded;
> > > > > > > >
> > > > > > > > +           /* Don't bother splitting the VMA if we can't unmap it anyway */
> > > > > > > > +           if (!can_modify_vma(vma)) {
> > > > > > > > +                   error = -EPERM;
> > > > > > > > +                   goto start_split_failed;
> > > > > > > > +           }
> > > > > > > > +
> > > > > > >
> > > > > > > Would this check be better placed in __split_vma()?  It could replace
> > > > > > > both this and the next chunk of code.
> > > > > >
> > > > > > not quite.
> > > > >
> > > > > Yeah, I was going to say that splitting a sealed VMA is okay (and we
> > > > > allow it on mlock and madvise).
> > > >
> > > > splitting a sealed vma wasn't supposed to be okay.. but it is Jeff's
> > > > feature.  Jeff?
> > > >
> > > Splitting a sealed VMA is wrong.
> > > Whoever wants to split a sealed VMA should  answer this question
> > > first: what is the use case for splitting a sealed VMA?
> >
> > If we lower the checks to __split_vma() and anywhere that does entire
> > vma modifications, then we would avoid modifications of the vma.  This
> > particular loop breaks on the final vma, if it needs splitting, so we'd
> > still need the check in the main loop to ensure the full vma isn't
> > mseal()'ed.  Splitting the start/end could be covered by the
> > __split_vma() function.
> >
> > >
> > > The V2 series doesn't have selftest change which indicates lack of
> > > testing. The out-of-loop check is positioned nearer to the API entry
> > > point and separated from internal business logic, thereby minimizing
> > > the testing requirements. However, as we move the sealing check
> > > further inward and intertwine it with business logic, greater test
> > > coverage becomes necessary to ensure  the correctness of  sealing
> > > is preserved.
> >
> > I would have expected more complete test coverage and not limited to
> > what is expected to happen with an up front test.  Changes may happen
> > that you aren't Cc'ed on (or when you are away, etc) that could cause a
> > silent failure which could go undetected for a prolonged period of time.
> >
> > If splitting a vma isn't okay, then it would be safer to test at least
> > in some scenarios in the upstream tests.  Ideally, all tests are
> > upstream so everyone can run the testing.
> >
> We will want to be careful about our expectation of  test coverage
> offered in selftest. When adding mseal, I added 40+ test cases to
> ensure mseal didn't regress on existing mm api, i.e. you can take the
> mseal test , make a small modification (removing seal=1) and run on an
> old version of kernel and they will pass. I think it is wrong to
> expect the selftest is all it takes to find a regression if the dev is
> doing a  *** major design/feature change ***. While it is possible to
> write test cases to guide all future changes, doing so requires much
> bigger effort with diminishing returns, essentially  it is testing an
> "impossible to reach cases" in existing code.

One of the main points of self testing is to ensure that someone doesn't
break your feature.  Your tests do no accomplish that, otherwise you
wouldn't be requesting more tests with these changes - this isn't a bug
fix.

The initial commit also does not say that the vma won't be split so it
doesn't seem fair to blame someone for not testing something that wasn't
tested before nor existed as a restriction in the documentation [1].
Note that splitting a vma just increases the vma count and doesn't
change attributes of the vma or affect the memory. Pedro wasn't involved
in the initial discussions, so he doesn't have the background about how
mseal() was treated differently.

Some of the language in your responses are troubling and seem to
indicate that the design/feature was done in a way to optimise
backporting (avoiding the business logic that has changed), and to
minimize the testing requirement.

Although I can understand (empathize, even) that you cannot test
everything from the start; you can grow your test set to avoid future
regressions and misunderstandings on how things should work.  (If the
dont_split_vma() tests fail, then I would probably question if I can
split a vma..)

Splitting a vma is a pretty fundamental operation, along with unmapping,
mapping, and merging.  I would expect a self tests to, at least, cover
the fundamentals, if they are to be affected.  If testing is lacking,
can you please provide test cases so that we can validate fixes for your
feature so it works as you intended/need it to work?  We don't know if
we are violating rules that aren't clear.

We are now waiting for you to run some tests (since Thursday) to see if
these patches are okay, but we have no idea what those tests are or
where they exist.  This is problematic as you seem to be the only one
able to complete a task subject to interpretation - as you might
imagine, this would be a problem for Chromium if your contact
information was to change, or you were away for several months.

You may consider this a major design/feature change, but it is a
necessary change to get back the performance and to keep mseal()
functioning.

Thanks,
Liam

[1]. https://lore.kernel.org/linux-mm/20240415163527.626541-1-jeffxu@xxxxxxxxxxxx/