On Mon, Oct 09, 2023 at 05:22:33PM +0200, Vlastimil Babka wrote: > On 10/8/23 22:23, Lorenzo Stoakes wrote: > > mprotect() and other functions which change VMA parameters over a range > > each employ a pattern of:- > > > > 1. Attempt to merge the range with adjacent VMAs. > > 2. If this fails, and the range spans a subset of the VMA, split it > > accordingly. > > > > This is open-coded and duplicated in each case. Also in each case most of > > the parameters passed to vma_merge() remain the same. > > > > Create a new static function, vma_modify(), which abstracts this operation, > > accepting only those parameters which can be changed. > > > > To avoid the mess of invoking each function call with unnecessary > > parameters, create wrapper functions for each of the modify operations, > > parameterised only by what is required to perform the action. > > Nice! Thanks :) > > > Note that the userfaultfd_release() case works even though it does not > > split VMAs - since start is set to vma->vm_start and end is set to > > vma->vm_end, the split logic does not trigger. > > > > In addition, since we calculate pgoff to be equal to vma->vm_pgoff + (start > > - vma->vm_start) >> PAGE_SHIFT, and start - vma->vm_start will be 0 in this > > instance, this invocation will remain unchanged. > > > > Signed-off-by: Lorenzo Stoakes <lstoakes@xxxxxxxxx> > > --- > > fs/userfaultfd.c | 53 +++++++++----------------- > > include/linux/mm.h | 23 ++++++++++++ > > mm/madvise.c | 25 ++++--------- > > mm/mempolicy.c | 20 ++-------- > > mm/mlock.c | 24 ++++-------- > > mm/mmap.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++ > > mm/mprotect.c | 27 ++++---------- > > 7 files changed, 157 insertions(+), 108 deletions(-) > > > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > > index a7c6ef764e63..9e5232d23927 100644 > > --- a/fs/userfaultfd.c > > +++ b/fs/userfaultfd.c > > @@ -927,11 +927,10 @@ static int userfaultfd_release(struct inode *inode, struct file *file) > > continue; > > } > > new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS; > > - prev = vma_merge(&vmi, mm, prev, vma->vm_start, vma->vm_end, > > - new_flags, vma->anon_vma, > > - vma->vm_file, vma->vm_pgoff, > > - vma_policy(vma), > > - NULL_VM_UFFD_CTX, anon_vma_name(vma)); > > + prev = vma_modify_uffd(&vmi, prev, vma, vma->vm_start, > > + vma->vm_end, new_flags, > > + NULL_VM_UFFD_CTX); > > + > > if (prev) { > > vma = prev; > > } else { > > @@ -1331,7 +1330,6 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, > > unsigned long start, end, vma_end; > > struct vma_iterator vmi; > > bool wp_async = userfaultfd_wp_async_ctx(ctx); > > - pgoff_t pgoff; > > > > user_uffdio_register = (struct uffdio_register __user *) arg; > > > > @@ -1484,26 +1482,18 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, > > vma_end = min(end, vma->vm_end); > > > > new_flags = (vma->vm_flags & ~__VM_UFFD_FLAGS) | vm_flags; > > - pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT); > > - prev = vma_merge(&vmi, mm, prev, start, vma_end, new_flags, > > - vma->anon_vma, vma->vm_file, pgoff, > > - vma_policy(vma), > > - ((struct vm_userfaultfd_ctx){ ctx }), > > - anon_vma_name(vma)); > > + prev = vma_modify_uffd(&vmi, prev, vma, start, vma_end, > > + new_flags, > > + ((struct vm_userfaultfd_ctx){ ctx })); > > if (prev) { > > This will hit also for IS_ERR(prev), no? > > > /* vma_merge() invalidated the mas */ > > vma = prev; > > goto next; > > } > > - if (vma->vm_start < start) { > > - ret = split_vma(&vmi, vma, start, 1); > > - if (ret) > > - break; > > - } > > - if (vma->vm_end > end) { > > - ret = split_vma(&vmi, vma, end, 0); > > - if (ret) > > - break; > > + > > + if (IS_ERR(prev)) { > > So here's too late to test for it. AFAICS the other usages are like this as > well. Oh dear :) yes you're right, I will rework this in v2 for all cases. > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index a7b667786cde..c069813f215f 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -3253,6 +3253,29 @@ extern struct vm_area_struct *copy_vma(struct vm_area_struct **, > > unsigned long addr, unsigned long len, pgoff_t pgoff, > > bool *need_rmap_locks); > > extern void exit_mmap(struct mm_struct *); > > +struct vm_area_struct *vma_modify_flags(struct vma_iterator *vmi, > > + struct vm_area_struct *prev, > > + struct vm_area_struct *vma, > > + unsigned long start, unsigned long end, > > + unsigned long new_flags); > > +struct vm_area_struct *vma_modify_flags_name(struct vma_iterator *vmi, > > + struct vm_area_struct *prev, > > + struct vm_area_struct *vma, > > + unsigned long start, > > + unsigned long end, > > + unsigned long new_flags, > > + struct anon_vma_name *new_name); > > +struct vm_area_struct *vma_modify_policy(struct vma_iterator *vmi, > > + struct vm_area_struct *prev, > > + struct vm_area_struct *vma, > > + unsigned long start, unsigned long end, > > + struct mempolicy *new_pol); > > +struct vm_area_struct *vma_modify_uffd(struct vma_iterator *vmi, > > + struct vm_area_struct *prev, > > + struct vm_area_struct *vma, > > + unsigned long start, unsigned long end, > > + unsigned long new_flags, > > + struct vm_userfaultfd_ctx new_ctx); > > Could these be instead static inline wrappers, and vma_modify exported > instead of static? I started by trying this but sadly the vma_policy() helper needs the mempolicy header and trying to important that into mm.h produces a horror show of things breaking. As discussed via IRC, will look to see whether we can sensibly move this define into mm_types.h and then we can shift these. > > Maybe we could also move this to mm/internal.h? Which would mean > fs/userfaultfd.c would have to start including it, but as it's already so > much rooted in mm, it shouldn't be wrong? I'm not a fan of trying to have fs/userfaultfd.c to important mm/internal.h, seems like a bridge too far there. I think it's a bit odd that the fs bit invokes mm bits but the mm bit doesn't, but this might be an artifact of how uffd is implemented. I do in principle like the idea, as we can then seriously shift what I consider to be impl details (mergey/splitty) to being as internal as we can make it, but I think perhaps it's something we can address later if it makes sense to move some uffd bits around. > > > > > static inline int check_data_rlimit(unsigned long rlim, > > unsigned long new, > > diff --git a/mm/madvise.c b/mm/madvise.c > > index a4a20de50494..73024693d5c8 100644 >