Re: [PATCH 07/10] mm: avoid using vma_merge() for new VMAs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> [240808 14:34]:
> * Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> [240808 14:02]:
> > On Thu, Aug 08, 2024 at 06:45:43PM GMT, Vlastimil Babka wrote:
> > > On 8/5/24 14:13, Lorenzo Stoakes wrote:
> > > > In mmap_region() and do_brk_flags() we open code scenarios where we prefer
> > > > to use vma_expand() rather than invoke a full vma_merge() operation.
> > > >
> > > > Abstract this logic and eliminate all of the open-coding, and also use the
> > > > same logic for all cases where we add new VMAs to, rather than ultimately
> > > > use vma_merge(), rather use vma_expand().
> > > >
> > > > We implement this by replacing vma_merge_new_vma() with this newly
> > > > abstracted logic.
> > > >
> > > > Doing so removes duplication and simplifies VMA merging in all such cases,
> > > > laying the ground for us to eliminate the merging of new VMAs in
> > > > vma_merge() altogether.
> > > >
> > > > This makes it far easier to understand what is happening in these cases
> > > > avoiding confusion, bugs and allowing for future optimisation.
> > > >
> > > > As a result of this change we are also able to make vma_prepare(),
> > > > init_vma_prep(), vma_complete(), can_vma_merge_before() and
> > > > can_vma_merge_after() static and internal to vma.c.
> > >
> > > That's really great, but it would be even better if these code moves could
> > > be a separate patch as it would make reviewing so much easier. But with git
> > > diff's --color-moved to the rescue, let me try...
> > 
> > Will separate out on respin.
> > 
> > >
> > > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx>
> > > > ---
> > > >  mm/mmap.c                        |  79 ++---
> > > >  mm/vma.c                         | 482 +++++++++++++++++++------------
> > > >  mm/vma.h                         |  51 +---
> > > >  tools/testing/vma/vma_internal.h |   6 +
> > > >  4 files changed, 324 insertions(+), 294 deletions(-)
> 
> ...
> > > > + */
> > > > +struct vm_area_struct *vma_merge_new_vma(struct vma_merge_struct *vmg)
> > > > +{
> > > > +	bool is_special = vmg->flags & VM_SPECIAL;
> > > > +	struct vm_area_struct *prev = vmg->prev;
> > > > +	struct vm_area_struct *next = vmg->next;
> > > > +	unsigned long start = vmg->start;
> > > > +	unsigned long end = vmg->end;
> > > > +	pgoff_t pgoff = vmg->pgoff;
> > > > +	pgoff_t pglen = PHYS_PFN(end - start);
> > > > +
> > > > +	VM_WARN_ON(vmg->vma);
> > > > +
> > > > +	if (!prev && !next) {
> > > > +		/*
> > > > +		 * Since the caller must have determined that the requested
> > > > +		 * range is empty, vmg->vmi will be left pointing at the VMA
> > > > +		 * immediately prior.
> > > > +		 */
> > >
> > > OK that's perhaps not that obvious, as it seems copy_vma() is doing some
> > > special dance to ensure this. Should we add it to the ASSUMPTIONS and assert
> > > it, or is there a maple tree operation we can do to ensure it, ideally if
> > > it's very cheap if the iterator is already set the way we want it to be?
> > >
> > 
> > To be fair this is something that was previously assumed, and I just added
> > a comment.
> > 
> > Will add to assumptions, and again I think any assert should be done in
> > such a way that under non-CONFIG_DEBUG_VM nothing happens, maybe
> > VM_WARN_ON()?
> > 
> > Will try to come up with something.

Something like:

VM_BUG_ON(vma_iter_end(vmg->vmi) > start);

> > 
> > > > +		next = vmg->next = vma_next(vmg->vmi);

and:

VM_BUG_ON(vma_iter_addr(vmg->vmi) < end);

> > > > +		prev = vmg->prev = vma_prev(vmg->vmi);
> > > > +
> > > > +		/* Avoid maple tree re-walk. */
> > > > +		if (is_special && prev)
> > > > +			vma_iter_next_range(vmg->vmi);
> > >
> > > I wish I knew what this did but seems it's the same as the old code did so
> > > hopefully that's fine.
> > 
> > I think point is that we are about to exit, so we'd be left pointing at
> > prev. But since we're exiting in just a second, we want to be pointing at
> > the next vma which will become the prev of the next merge attempt.
> > 
> > Liam can maybe elucidate further.
> 
> What you have to remember is that the vma iterator (vmg->vmi above),
> contains (or, basically is) a maple state (usually written as mas).  We
> keep state of the maple tree walker so that we don't have to keep
> re-walking to find the same thing.  We move around the tree with this
> maple state because going prev/next is faster from leaves (almost always
> just the next thing in the nodes array of pointers).
> 
> We use the maple state to write as well, so the maple state needs to
> point to the correct location in the tree for a write.
> 
> The maple tree is a range-based tree, so each entry exists for a span of
> values.  A write happens at the lowest index and can overwrite
> subsequent values.  This means that the maple state needs to point to
> the range containing the lowest index for the write (if it's pointing to
> a node - it could walk from the top).
> 
> A side effect of writing to the lowest index is that we need to point to
> the previous vma if we are going to 'expand' the vma.  The range is
> essentially going to be from prev->start to "whatever we are expanding
> over".
> 
> In the old code, the vm_flags & VM_SPECIAL code meant there was no way
> an expansion was going to happen, but we've moved the maple state to the
> wrong location for a write of a new vma - so this vma_iter_next_range()
> just moves it back.  Then we "goto cannot_expand".
> 




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux