Re: [PATCH v4 12/66] kernel/fork: Use maple tree for dup_mmap() during forking

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Vlastimil Babka <vbabka@xxxxxxx> [211216 06:09]:
> On 12/1/21 15:29, Liam Howlett wrote:
> > From: "Liam R. Howlett" <Liam.Howlett@xxxxxxxxxx>
> > 
> > The maple tree was already tracking VMAs in this function by an earlier
> > commit, but the rbtree iterator was being used to iterate the list.
> > Change the iterator to use a maple tree native iterator and switch to
> > the maple tree advanced API to avoid multiple walks of the tree during
> > insert operations.  Unexport the now-unused vma_store() function.
> > 
> > We track whether we need to free the VMAs and tree nodes through RCU
> > (ie whether there have been multiple threads that can see the mm_struct
> > simultaneously; by pthread(), ptrace() or looking at /proc/$pid/maps).
> > This setting is sticky because it's too tricky to decide when it's safe
> > to exit RCU mode.
> 
> I don't immediately see why enabling the RCU tracking in mmget is part of
> the dup_mmap() change?

Enabling the RCU tracking during the dup_mmap() change seems the correct
place to put a change dealing with counting the number of users of the
mmap.  This commit switches to using the maple tree as the primary
method of tracking the mmap, so it seems logical to me.

> 
> > For performance reasons we bulk allocate the maple tree nodes.  The node
> > calculations are done internally to the tree and use the VMA count and
> > assume the worst-case node requirements.  The VM_DONT_COPY flag does
> > not allow for the most efficient copy method of the tree and so a bulk
> > loading algorithm is used.
> > 
> > Signed-off-by: Liam R. Howlett <Liam.Howlett@xxxxxxxxxx>
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
> 
> Acked-by: Vlastimil Babka <vbabka@xxxxxxx>
> 
> >  static inline bool mmget_not_zero(struct mm_struct *mm)
> >  {
> > +	/*
> > +	 * There is a race below during task tear down that can cause the maple
> 
> What does 'below' refer to here?

If mm_users is zero and we arrive at mmget_not_zero, the maple tree will
enter RCU mode.  The only way mm_users is zero would be task teardown,
so we could have a race with teardown and another thread.. but the
result would be that the maple tree would delay freeing nodes - which is
not tragic.

I had a discussion with Hilf Danton in v2 of this patch set with the
code slightly different, but with the same outcome [1].

So 'below' means below the comment block.

> 
> > +	 * tree to enter rcu mode with only a single user.  If this race
> > +	 * happens, the result would be that the maple tree nodes would remain
> > +	 * active for an extra RCU read cycle.
> > +	 */
> > +	if (!mt_in_rcu(&mm->mm_mt))
> > +		mm_set_in_rcu(mm);
> >  	return atomic_inc_not_zero(&mm->mm_users);
> >  }
> >  
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index cc9bb95c7678..c9f8465d8ae2 100644

1.  https://lore.kernel.org/lkml/20210820174544.cvbdwpp6cfebos2m@revolver/




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux