Re: [RFC PATCH 3/7] mm: unexport vma_expand() / vma_shrink()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 27, 2024 at 01:45:34PM -0400, Liam R. Howlett wrote:
> * Lorenzo Stoakes <lstoakes@xxxxxxxxx> [240627 06:39]:
> > The vma_expand() and vma_shrink() functions are core VMA manipulaion
> > functions which ultimately invoke VMA split/merge. In order to make these
> > testable, it is convenient to place all such core functions in a header
> > internal to mm/.
> >
>
> The sole user doesn't cause a split or merge, it relocates a vma by
> 'sliding' the window of the vma by expand/shrink with the moving of page
> tables in the middle of the slide.
>
> It slides to relocate the vma start/end and keep the vma pointer
> constant.

Yeah sorry, I actually don't know why I said this (I did say ultimately
again as well!), as you say and I was in fact aware of, this doesn't invoke
split/merge. I will put this down to me being tired when I wrote this :)

Will fix.

>
> > In addition, it is safer to abstract direct access to such functionality so
> > we can better control how other parts of the kernel use them, which
> > provides us the freedom to change how this functionality behaves as needed
> > without having to worry about how this functionality is used elsewhere.
> >
> > In order to service both these requirements, we provide abstractions for
> > the sole external user of these functions, shift_arg_pages() in fs/exec.c.
> >
> > We provide vma_expand_bottom() and vma_shrink_top() functions which better
> > match the semantics of what shift_arg_pages() is trying to accomplish by
> > explicitly wrapping the safe expansion of the bottom of a VMA and the
> > shrinking of the top of a VMA.
> >
> > As a result, we place the vma_shrink() and vma_expand() functions into
> > mm/internal.h to unexport them from use by any other part of the kernel.
>
> There is no point to have vma_shrink() have a wrapper since this is the
> only place it's ever used.  So we're wrapping a function that's only
> called once.

Yeah that was a sketchy part of this change, I feel the vma_expand() case
is a lot more defensible, the vma_shrink() one, well I expected I might get
some feedback on anyway :)

This was obviously to try to find a way to abstract these away from fs/ in
some vaguely sensible fashion while retaining functionality.

>
> I'd rather a vma_relocate() do everything in this function than wrap
> them.  The only other think it does is the page table moving and freeing
> - which we have to do in the vma code.  We;d expose something we want no
> one to use - but we already have two of those here..

Right, I think I was trying to avoid _the whole thing_ as it's so specific
and not so nice to make available, but at the same time, it is perhaps the
only way forward reasonably to avoid the vma_shrink() micro-wrapper.

So yeah, will rework with a vma_relocate() or similar. As you say, we can't
really get away from exposing something nasty here.

>
> >
> > Signed-off-by: Lorenzo Stoakes <lstoakes@xxxxxxxxx>
> > ---
> >  fs/exec.c          | 26 +++++--------------
> >  include/linux/mm.h |  9 +++----
> >  mm/internal.h      |  6 +++++
> >  mm/mmap.c          | 65 ++++++++++++++++++++++++++++++++++++++++++++++
> >  4 files changed, 82 insertions(+), 24 deletions(-)
> >
> > diff --git a/fs/exec.c b/fs/exec.c
> > index 40073142288f..1cb3bf323e0f 100644
> > --- a/fs/exec.c
> > +++ b/fs/exec.c
> > @@ -700,25 +700,14 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
> >  	unsigned long length = old_end - old_start;
> >  	unsigned long new_start = old_start - shift;
> >  	unsigned long new_end = old_end - shift;
> > -	VMA_ITERATOR(vmi, mm, new_start);
> > +	VMA_ITERATOR(vmi, mm, 0);
> >  	struct vm_area_struct *next;
> >  	struct mmu_gather tlb;
> > +	int ret;
> >
> > -	BUG_ON(new_start > new_end);
> > -
> > -	/*
> > -	 * ensure there are no vmas between where we want to go
> > -	 * and where we are
> > -	 */
> > -	if (vma != vma_next(&vmi))
> > -		return -EFAULT;
> > -
> > -	vma_iter_prev_range(&vmi);
> > -	/*
> > -	 * cover the whole range: [new_start, old_end)
> > -	 */
> > -	if (vma_expand(&vmi, vma, new_start, old_end, vma->vm_pgoff, NULL))
> > -		return -ENOMEM;
> > +	ret = vma_expand_bottom(&vmi, vma, shift, &next);
> > +	if (ret)
> > +		return ret;
> >
> >  	/*
> >  	 * move the page tables downwards, on failure we rely on
> > @@ -730,7 +719,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
> >
> >  	lru_add_drain();
> >  	tlb_gather_mmu(&tlb, mm);
> > -	next = vma_next(&vmi);
> > +
> >  	if (new_end > old_start) {
> >  		/*
> >  		 * when the old and new regions overlap clear from new_end.
> > @@ -749,9 +738,8 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
> >  	}
> >  	tlb_finish_mmu(&tlb);
> >
> > -	vma_prev(&vmi);
> >  	/* Shrink the vma to just the new range */
> > -	return vma_shrink(&vmi, vma, new_start, new_end, vma->vm_pgoff);
> > +	return vma_shrink_top(&vmi, vma, shift);
> >  }
> >
> >  /*
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 4d2b5538925b..e3220439cf75 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -3273,11 +3273,10 @@ void anon_vma_interval_tree_verify(struct anon_vma_chain *node);
> >
> >  /* mmap.c */
> >  extern int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin);
> > -extern int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > -		      unsigned long start, unsigned long end, pgoff_t pgoff,
> > -		      struct vm_area_struct *next);
> > -extern int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > -		       unsigned long start, unsigned long end, pgoff_t pgoff);
> > +extern int vma_expand_bottom(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > +			     unsigned long shift, struct vm_area_struct **next);
> > +extern int vma_shrink_top(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > +			  unsigned long shift);
> >  extern struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *);
> >  extern int insert_vm_struct(struct mm_struct *, struct vm_area_struct *);
> >  extern void unlink_file_vma(struct vm_area_struct *);
> > diff --git a/mm/internal.h b/mm/internal.h
> > index c8177200c943..f7779727bb78 100644
> > --- a/mm/internal.h
> > +++ b/mm/internal.h
> > @@ -1305,6 +1305,12 @@ static inline struct vm_area_struct
> >  			  vma_policy(vma), new_ctx, anon_vma_name(vma));
> >  }
> >
> > +int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > +	       unsigned long start, unsigned long end, pgoff_t pgoff,
> > +		      struct vm_area_struct *next);
> > +int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > +	       unsigned long start, unsigned long end, pgoff_t pgoff);
> > +
> >  enum {
> >  	/* mark page accessed */
> >  	FOLL_TOUCH = 1 << 16,
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index e42d89f98071..574e69a04ebe 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -3940,6 +3940,71 @@ void mm_drop_all_locks(struct mm_struct *mm)
> >  	mutex_unlock(&mm_all_locks_mutex);
> >  }
> >
> > +/*
> > + * vma_expand_bottom() - Expands the bottom of a VMA downwards. An error will
> > + *                       arise if there is another VMA in the expanded range, or
> > + *                       if the expansion fails. This function leaves the VMA
> > + *                       iterator, vmi, positioned at the newly expanded VMA.
> > + * @vmi: The VMA iterator.
> > + * @vma: The VMA to modify.
> > + * @shift: The number of bytes by which to expand the bottom of the VMA.
> > + * @next: Output parameter, pointing at the VMA immediately succeeding the newly
> > + *        expanded VMA.
> > + *
> > + * Returns: 0 on success, an error code otherwise.
> > + */
> > +int vma_expand_bottom(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > +		      unsigned long shift, struct vm_area_struct **next)
> > +{
> > +	unsigned long old_start = vma->vm_start;
> > +	unsigned long old_end = vma->vm_end;
> > +	unsigned long new_start = old_start - shift;
> > +	unsigned long new_end = old_end - shift;
> > +
> > +	BUG_ON(new_start > new_end);
> > +
> > +	vma_iter_set(vmi, new_start);
> > +
> > +	/*
> > +	 * ensure there are no vmas between where we want to go
> > +	 * and where we are
> > +	 */
> > +	if (vma != vma_next(vmi))
> > +		return -EFAULT;
> > +
> > +	vma_iter_prev_range(vmi);
> > +
> > +	/*
> > +	 * cover the whole range: [new_start, old_end)
> > +	 */
> > +	if (vma_expand(vmi, vma, new_start, old_end, vma->vm_pgoff, NULL))
> > +		return -ENOMEM;
> > +
> > +	*next = vma_next(vmi);
> > +	vma_prev(vmi);
> > +
> > +	return 0;
> > +}
> > +
> > +/*
> > + * vma_shrink_top() - Reduce an existing VMA's memory area by shift bytes from
> > + *                    the top of the VMA.
> > + * @vmi: The VMA iterator, must be positioned at the VMA.
> > + * @vma: The VMA to modify.
> > + * @shift: The number of bytes by which to shrink the VMA.
> > + *
> > + * Returns: 0 on success, an error code otherwise.
> > + */
> > +int vma_shrink_top(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > +		   unsigned long shift)
> > +{
> > +	if (shift >= vma->vm_end - vma->vm_start)
> > +		return -EINVAL;
> > +
> > +	return vma_shrink(vmi, vma, vma->vm_start, vma->vm_end - shift,
> > +			  vma->vm_pgoff);
> > +}
> > +
> >  /*
> >   * initialise the percpu counter for VM
> >   */
> > --
> > 2.45.1
> >




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux