Re: [PATCH] kernel/fork: Be more careful about dup_mmap() failures

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jan 24, 2025 at 01:19:45PM -0500, Liam R. Howlett wrote:
> Adding Oleg, who I forgot on the start of this patch..
>
>
> * Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> [250124 06:15]:
> > On Thu, Jan 23, 2025 at 03:58:49PM -0500, Liam R. Howlett wrote:
> > > From: "Liam R. Howlett" <Liam.Howlett@xxxxxxxxxx>
> > >
> > > In the even that there is a failure during dup_mmap(), the maple tree
> > > can be left in an unsafe state for other iterators besides the exit
> > > path.
> > >
> > > The unsafe state is created after the tree is cloned, but before the
> > > vmas are replaced; if a vma allocation fails (for instance), then the
> > > tree will have a marker (XA_ZERO_ENTRY) to denote where to stop
> > > destroying vmas on the exit path.  This marker replaces a vma in the
> > > tree and may be treated as a pointer to a vma in iterators besides the
> > > special case exit_mmap() iterator.
> >
> > Thanks for addresing this.
>
> Thanks for looking at the patch.
>
> >
> > In the original issue, the problem arose in vm_area_dup() which I tracked
> > down to vma_iter_load() -> mas_walk() (see [0]).
> >
> > I'm not sure if your change would actually have addressed this? Would the
> > MMF_UNSTABLE flag prevent this code path from being taken (if so where?),
> > if not don't we need to add a check for this somewhere?
> >
> > [0]:https://lore.kernel.org/all/aa2c1930-becc-4bc5-adfb-96e88290acc7@lucifer.local/
>
> Remote vma accesses use check_stable_address_space() to avoid races
> with the oom killer, which checks MMF_UNSTABLE.
>
> Yeah, uprobe needs to do this and doesn't today.
>
> Oleg suggested this be done in build_map_info(), but the MMF_UNSTABLE
> bit is set under the mmap read lock and that function uses the read lock
> on the rmap, takes an mm references and records the address -  then
> revisits the mm later to get the vma in register_for_each_vma().
>
> I think we need to check in register_for_each_vma(), prior to looking up
> the vma.  This way we hold the write lock on the mm and will ensure the
> vma is stable from both the fork setup and a potential OOM events.  The
> concern is that we have an invalid vma or the prev/next is invalid and
> that will only be a concern when operating on the vma itself, so this
> would be the safest place imo.
>
> The problem that has been created is that the next vma may be invalid.
> This arises when the rmap is used to find a vma then changes that vma.
> In this scenario, the mmap lock of the vma needs to be held and that mm
> needs to be checked as stable/not oom'ed.  So we can limit the checking
> of the code to the scenario where another mm's vma is found and
> modified - which may cause a merge and look at the next vma.
>
> I haven't found anywhere else that looks up another mm_struct's vma and
> makes a change to the vma.  Considering what that would mean, it makes
> sense that this is rare.
>
> So, I'll just do this:
>
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -1260,6 +1260,9 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
>                  * returns NULL in find_active_uprobe_rcu().
>                  */
>                 mmap_write_lock(mm);
> +               if (check_stable_address_space(mm)
> +                               goto unlock;
> +
>
>

Yeah I think on reflection this is sensible, at least as a proximate
solution. It certainly doesn't harm to apply the MMF_UNSTABLE flag on top
of this and we can always expand where we check/what we do with this.

>
>
> >
> > >
> > > All the locks are dropped before the exit_mmap() call, but the
> > > incomplete mm_struct can be reached through (at least) the rmap finding
> > > the vmas which have a pointer back to the mm_struct.
> > >
> > > Up to this point, there have been no issues with being able to find an
> > > mm_sturct that was only partially initialised.  Syzbot was able to make
> > > the incomplete mm_struct fail with recent forking changes, so it has
> > > been proven unsafe to use the mm_sturct that hasn't been initialised, as
> > > referenced in the link below.
> > >
> > > Although 8ac662f5da19f ("fork: avoid inappropriate uprobe access to
> > > invalid mm") fixed the uprobe access, it does not completely remove the
> > > race.
> > >
> > > This patch sets the MMF_OOM_SKIP to avoid the iteration of the vmas on
> > > the oom side (even though this is extremely unlikely to be selected as
> > > an oom victim in the race window), and sets MMF_UNSTABLE to avoid other
> > > potential users from using a partially initialised mm_struct.
> >
> > Good to have it set also I think.
> >
> > We were concerned that _somehow_ MMF_UNSTABLE might cause some issue in
> > OOM, which was why when I first thought of this we decided maybe not just
> > to be safe, but I am confident this will not be an issue as indicating that
> > something is about to be torn down and is unstable when it is in fact
> > unstable and about to be torn down is correct, plus I checked all the paths
> > where this impacted and found no issues.
> >
> > And in any case, setting this flag avoids the problem...!
> >
> > >
> > > Link: https://lore.kernel.org/all/6756d273.050a0220.2477f.003d.GAE@xxxxxxxxxx/
> > > Fixes: d240629148377 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
> > > Cc: Jann Horn <jannh@xxxxxxxxxx>
> > > Cc: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx>
> > > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> > > Cc: Michal Hocko <mhocko@xxxxxxxx>
> > > Cc: Peng Zhang <zhangpeng.00@xxxxxxxxxxxxx>
> > > Signed-off-by: Liam R. Howlett <Liam.Howlett@xxxxxxxxxx>
> > > ---
> > >  kernel/fork.c | 17 ++++++++++++++---
> > >  1 file changed, 14 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/kernel/fork.c b/kernel/fork.c
> > > index ded49f18cd95c..20b2120f019ca 100644
> > > --- a/kernel/fork.c
> > > +++ b/kernel/fork.c
> > > @@ -760,7 +760,8 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
> > >  		mt_set_in_rcu(vmi.mas.tree);
> > >  		ksm_fork(mm, oldmm);
> > >  		khugepaged_fork(mm, oldmm);
> > > -	} else if (mpnt) {
> > > +	} else {
> > > +
> > >  		/*
> > >  		 * The entire maple tree has already been duplicated. If the
> > >  		 * mmap duplication fails, mark the failure point with
> > > @@ -768,8 +769,18 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
> > >  		 * stop releasing VMAs that have not been duplicated after this
> > >  		 * point.
> > >  		 */
> > > -		mas_set_range(&vmi.mas, mpnt->vm_start, mpnt->vm_end - 1);
> > > -		mas_store(&vmi.mas, XA_ZERO_ENTRY);
> > > +		if (mpnt) {
> > > +			mas_set_range(&vmi.mas, mpnt->vm_start, mpnt->vm_end - 1);
> > > +			mas_store(&vmi.mas, XA_ZERO_ENTRY);
> > > +			/* Avoid OOM iterating a broken tree */
> > > +			set_bit(MMF_OOM_SKIP, &mm->flags);
> > > +		}
> > > +		/*
> > > +		 * The mm_struct is going to exit, but the locks will be dropped
> > > +		 * first.  Set the mm_struct as unstable is advisable as it is
> > > +		 * not fully initialised.
> > > +		 */
> > > +		set_bit(MMF_UNSTABLE, &mm->flags);
> >
> > Do we still need to do this if !mpnt?
>
> No, we probably don't need this now.  But it still makes sense to do
> this considering what we are saying.  There is a failed mm, let's mark
> it as unstable because it's going away in a moment - it's not stable,
> it's not even a complete mm.
>
> >
> > Also 'mpnt' is in the running for the worst variable name in mm...
> >
>
> mpnt is the oldMm vma Pointer, Now isn'T it?  I think that's self
> documenting.

:)))

>
> Thanks,
> Liam




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux