Re: [PATCH v5 2/2] mm: add a field to store names for private anonymous memory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[Cc linux-api]

On Wed 19-08-20 19:46:50, Sumit Semwal wrote:
> From: Colin Cross <ccross@xxxxxxxxxx>
> 
> In many userspace applications, and especially in VM based applications
> like Android uses heavily, there are multiple different allocators in use.
>  At a minimum there is libc malloc and the stack, and in many cases there
> are libc malloc, the stack, direct syscalls to mmap anonymous memory, and
> multiple VM heaps (one for small objects, one for big objects, etc.).
> Each of these layers usually has its own tools to inspect its usage;
> malloc by compiling a debug version, the VM through heap inspection tools,
> and for direct syscalls there is usually no way to track them.
> 
> On Android we heavily use a set of tools that use an extended version of
> the logic covered in Documentation/vm/pagemap.txt to walk all pages mapped
> in userspace and slice their usage by process, shared (COW) vs.  unique
> mappings, backing, etc.  This can account for real physical memory usage
> even in cases like fork without exec (which Android uses heavily to share
> as many private COW pages as possible between processes), Kernel SamePage
> Merging, and clean zero pages.  It produces a measurement of the pages
> that only exist in that process (USS, for unique), and a measurement of
> the physical memory usage of that process with the cost of shared pages
> being evenly split between processes that share them (PSS).
> 
> If all anonymous memory is indistinguishable then figuring out the real
> physical memory usage (PSS) of each heap requires either a pagemap walking
> tool that can understand the heap debugging of every layer, or for every
> layer's heap debugging tools to implement the pagemap walking logic, in
> which case it is hard to get a consistent view of memory across the whole
> system.
> 
> Tracking the information in userspace leads to all sorts of problems.
> It either needs to be stored inside the process, which means every
> process has to have an API to export its current heap information upon
> request, or it has to be stored externally in a filesystem that
> somebody needs to clean up on crashes.  It needs to be readable while
> the process is still running, so it has to have some sort of
> synchronization with every layer of userspace.  Efficiently tracking
> the ranges requires reimplementing something like the kernel vma
> trees, and linking to it from every layer of userspace.  It requires
> more memory, more syscalls, more runtime cost, and more complexity to
> separately track regions that the kernel is already tracking.
> 
> This patch adds a field to /proc/pid/maps and /proc/pid/smaps to show a
> userspace-provided name for anonymous vmas.  The names of named anonymous
> vmas are shown in /proc/pid/maps and /proc/pid/smaps as [anon:<name>].
> 
> Userspace can set the name for a region of memory by calling
> prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name);
> Setting the name to NULL clears it.
> 
> The name is stored in a user pointer in the shared union in vm_area_struct
> that points to a null terminated string inside the user process.  vmas
> that point to the same address and are otherwise mergeable will be merged,
> but vmas that point to equivalent strings at different addresses will not
> be merged.
> 
> The idea to store a userspace pointer to reduce the complexity within mm
> (at the expense of the complexity of reading /proc/pid/mem) came from Dave
> Hansen.  This results in no runtime overhead in the mm subsystem other
> than comparing the anon_name pointers when considering vma merging.  The
> pointer is stored in a union with fields that are only used on file-backed
> mappings, so it does not increase memory usage.
> (Upstream changed to remove the union, so this patch adds it back as well)
> 
> Signed-off-by: Colin Cross <ccross@xxxxxxxxxx>
> Cc: Pekka Enberg <penberg@xxxxxxxxxx>
> Cc: Dave Hansen <dave.hansen@xxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Oleg Nesterov <oleg@xxxxxxxxxx>
> Cc: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
> Cc: Jan Glauber <jan.glauber@xxxxxxxxx>
> Cc: John Stultz <john.stultz@xxxxxxxxxx>
> Cc: Rob Landley <rob@xxxxxxxxxxx>
> Cc: Cyrill Gorcunov <gorcunov@xxxxxxxxxx>
> Cc: Kees Cook <keescook@xxxxxxxxxxxx>
> Cc: "Serge E. Hallyn" <serge.hallyn@xxxxxxxxxx>
> Cc: David Rientjes <rientjes@xxxxxxxxxx>
> Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
> Cc: Hugh Dickins <hughd@xxxxxxxxxx>
> Cc: Rik van Riel <riel@xxxxxxxxxx>
> Cc: Mel Gorman <mgorman@xxxxxxx>
> Cc: Michel Lespinasse <walken@xxxxxxxxxx>
> Cc: Tang Chen <tangchen@xxxxxxxxxxxxxx>
> Cc: Robin Holt <holt@xxxxxxx>
> Cc: Shaohua Li <shli@xxxxxxxxxxxx>
> Cc: Sasha Levin <sasha.levin@xxxxxxxxxx>
> Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
> Cc: Minchan Kim <minchan@xxxxxxxxxx>
> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Sumit Semwal <sumit.semwal@xxxxxxxxxx>
> ---
>  Documentation/filesystems/proc.rst |  2 ++
>  fs/proc/task_mmu.c                 | 24 ++++++++++++-
>  include/linux/mm.h                 |  5 ++-
>  include/linux/mm_types.h           | 23 +++++++++++--
>  include/uapi/linux/prctl.h         |  3 ++
>  kernel/sys.c                       | 32 ++++++++++++++++++
>  mm/interval_tree.c                 | 34 +++++++++----------
>  mm/madvise.c                       | 54 +++++++++++++++++++++++++++---
>  mm/mempolicy.c                     |  3 +-
>  mm/mlock.c                         |  2 +-
>  mm/mmap.c                          | 38 ++++++++++++---------
>  mm/mprotect.c                      |  2 +-
>  12 files changed, 177 insertions(+), 45 deletions(-)
> 
> diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> index 533c79e8d2cd..41a9cea73b8b 100644
> --- a/Documentation/filesystems/proc.rst
> +++ b/Documentation/filesystems/proc.rst
> @@ -429,6 +429,8 @@ is not associated with a file:
>   [stack]                    the stack of the main process
>   [vdso]                     the "virtual dynamic shared object",
>                              the kernel system call handler
> +[anon:<name>]               an anonymous mapping that has been
> +                            named by userspace
>   =======                    ====================================
>  
>   or if empty, the mapping is anonymous.
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 5066b0251ed8..136fd3c3ad7b 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -97,6 +97,21 @@ unsigned long task_statm(struct mm_struct *mm,
>  	return mm->total_vm;
>  }
>  
> +static void seq_print_vma_name(struct seq_file *m, struct vm_area_struct *vma)
> +{
> +	struct mm_struct *mm = vma->vm_mm;
> +	char anon_name[NAME_MAX + 1];
> +	int n;
> +
> +	n = access_remote_vm(mm, (unsigned long)vma_anon_name(vma),
> +			     anon_name, NAME_MAX, 0);
> +	if (n > 0) {
> +		seq_puts(m, "[anon:");
> +		seq_write(m, anon_name, strnlen(anon_name, n));
> +		seq_putc(m, ']');
> +	}
> +}
> +
>  #ifdef CONFIG_NUMA
>  /*
>   * Save get_task_policy() for show_numa_map().
> @@ -319,8 +334,15 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma)
>  			goto done;
>  		}
>  
> -		if (is_stack(vma))
> +		if (is_stack(vma)) {
>  			name = "[stack]";
> +			goto done;
> +		}
> +
> +		if (vma_anon_name(vma)) {
> +			seq_pad(m, ' ');
> +			seq_print_vma_name(m, vma);
> +		}
>  	}
>  
>  done:
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 1983e08f5906..c64171529254 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2484,7 +2484,7 @@ static inline int vma_adjust(struct vm_area_struct *vma, unsigned long start,
>  extern struct vm_area_struct *vma_merge(struct mm_struct *,
>  	struct vm_area_struct *prev, unsigned long addr, unsigned long end,
>  	unsigned long vm_flags, struct anon_vma *, struct file *, pgoff_t,
> -	struct mempolicy *, struct vm_userfaultfd_ctx);
> +	struct mempolicy *, struct vm_userfaultfd_ctx, const char __user *);
>  extern struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *);
>  extern int __split_vma(struct mm_struct *, struct vm_area_struct *,
>  	unsigned long addr, int new_below);
> @@ -3123,5 +3123,8 @@ unsigned long wp_shared_mapping_range(struct address_space *mapping,
>  
>  extern int sysctl_nr_trim_pages;
>  
> +int madvise_set_anon_name(unsigned long start, unsigned long len_in,
> +			  unsigned long name_addr);
> +
>  #endif /* __KERNEL__ */
>  #endif /* _LINUX_MM_H */
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 496c3ff97cce..ac8d687ebfb5 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -336,10 +336,18 @@ struct vm_area_struct {
>  	/*
>  	 * For areas with an address space and backing store,
>  	 * linkage into the address_space->i_mmap interval tree.
> +	 *
> +	 * For private anonymous mappings, a pointer to a null terminated string
> +	 * in the user process containing the name given to the vma, or NULL
> +	 * if unnamed.
>  	 */
> -	struct {
> -		struct rb_node rb;
> -		unsigned long rb_subtree_last;
> +
> +	union {
> +		struct {
> +			struct rb_node rb;
> +			unsigned long rb_subtree_last;
> +		} interval;
> +		const char __user *anon_name;
>  	} shared;
>  
>  	/*
> @@ -772,4 +780,13 @@ typedef struct {
>  	unsigned long val;
>  } swp_entry_t;
>  
> +/* Return the name for an anonymous mapping or NULL for a file-backed mapping */
> +static inline const char __user *vma_anon_name(struct vm_area_struct *vma)
> +{
> +	if (vma->vm_file)
> +		return NULL;
> +
> +	return vma->shared.anon_name;
> +}
> +
>  #endif /* _LINUX_MM_TYPES_H */
> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
> index 07b4f8131e36..10773270f67b 100644
> --- a/include/uapi/linux/prctl.h
> +++ b/include/uapi/linux/prctl.h
> @@ -238,4 +238,7 @@ struct prctl_mm_map {
>  #define PR_SET_IO_FLUSHER		57
>  #define PR_GET_IO_FLUSHER		58
>  
> +#define PR_SET_VMA		0x53564d41
> +# define PR_SET_VMA_ANON_NAME		0
> +
>  #endif /* _LINUX_PRCTL_H */
> diff --git a/kernel/sys.c b/kernel/sys.c
> index ca11af9d815d..da90837b5ccd 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -2280,6 +2280,35 @@ int __weak arch_prctl_spec_ctrl_set(struct task_struct *t, unsigned long which,
>  
>  #define PR_IO_FLUSHER (PF_MEMALLOC_NOIO | PF_LOCAL_THROTTLE)
>  
> +#ifdef CONFIG_MMU
> +static int prctl_set_vma(unsigned long opt, unsigned long addr,
> +			 unsigned long len, unsigned long arg)
> +{
> +	struct mm_struct *mm = current->mm;
> +	int error;
> +
> +	mmap_write_lock(mm);
> +
> +	switch (opt) {
> +	case PR_SET_VMA_ANON_NAME:
> +		error = madvise_set_anon_name(addr, len, arg);
> +		break;
> +	default:
> +		error = -EINVAL;
> +	}
> +
> +	mmap_write_unlock(mm);
> +
> +	return error;
> +}
> +#else /* CONFIG_MMU */
> +static int prctl_set_vma(unsigned long opt, unsigned long start,
> +			 unsigned long len_in, unsigned long arg)
> +{
> +	return -EINVAL;
> +}
> +#endif
> +
>  SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
>  		unsigned long, arg4, unsigned long, arg5)
>  {
> @@ -2530,6 +2559,9 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
>  
>  		error = (current->flags & PR_IO_FLUSHER) == PR_IO_FLUSHER;
>  		break;
> +	case PR_SET_VMA:
> +		error = prctl_set_vma(arg2, arg3, arg4, arg5);
> +		break;
>  	default:
>  		error = -EINVAL;
>  		break;
> diff --git a/mm/interval_tree.c b/mm/interval_tree.c
> index 11c75fb07584..d684ce0762cd 100644
> --- a/mm/interval_tree.c
> +++ b/mm/interval_tree.c
> @@ -20,8 +20,8 @@ static inline unsigned long vma_last_pgoff(struct vm_area_struct *v)
>  	return v->vm_pgoff + vma_pages(v) - 1;
>  }
>  
> -INTERVAL_TREE_DEFINE(struct vm_area_struct, shared.rb,
> -		     unsigned long, shared.rb_subtree_last,
> +INTERVAL_TREE_DEFINE(struct vm_area_struct, shared.interval.rb,
> +		     unsigned long, shared.interval.rb_subtree_last,
>  		     vma_start_pgoff, vma_last_pgoff,, vma_interval_tree)
>  
>  /* Insert node immediately after prev in the interval tree */
> @@ -35,26 +35,26 @@ void vma_interval_tree_insert_after(struct vm_area_struct *node,
>  
>  	VM_BUG_ON_VMA(vma_start_pgoff(node) != vma_start_pgoff(prev), node);
>  
> -	if (!prev->shared.rb.rb_right) {
> +	if (!prev->shared.interval.rb.rb_right) {
>  		parent = prev;
> -		link = &prev->shared.rb.rb_right;
> +		link = &prev->shared.interval.rb.rb_right;
>  	} else {
> -		parent = rb_entry(prev->shared.rb.rb_right,
> -				  struct vm_area_struct, shared.rb);
> -		if (parent->shared.rb_subtree_last < last)
> -			parent->shared.rb_subtree_last = last;
> -		while (parent->shared.rb.rb_left) {
> -			parent = rb_entry(parent->shared.rb.rb_left,
> -				struct vm_area_struct, shared.rb);
> -			if (parent->shared.rb_subtree_last < last)
> -				parent->shared.rb_subtree_last = last;
> +		parent = rb_entry(prev->shared.interval.rb.rb_right,
> +				  struct vm_area_struct, shared.interval.rb);
> +		if (parent->shared.interval.rb_subtree_last < last)
> +			parent->shared.interval.rb_subtree_last = last;
> +		while (parent->shared.interval.rb.rb_left) {
> +			parent = rb_entry(parent->shared.interval.rb.rb_left,
> +					  struct vm_area_struct, shared.interval.rb);
> +			if (parent->shared.interval.rb_subtree_last < last)
> +				parent->shared.interval.rb_subtree_last = last;
>  		}
> -		link = &parent->shared.rb.rb_left;
> +		link = &parent->shared.interval.rb.rb_left;
>  	}
>  
> -	node->shared.rb_subtree_last = last;
> -	rb_link_node(&node->shared.rb, &parent->shared.rb, link);
> -	rb_insert_augmented(&node->shared.rb, &root->rb_root,
> +	node->shared.interval.rb_subtree_last = last;
> +	rb_link_node(&node->shared.interval.rb, &parent->shared.interval.rb, link);
> +	rb_insert_augmented(&node->shared.interval.rb, &root->rb_root,
>  			    &vma_interval_tree_augment);
>  }
>  
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 84482c21b029..7da8493fa6d3 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -65,13 +65,14 @@ static int madvise_need_mmap_write(int behavior)
>   */
>  static int madvise_update_vma(struct vm_area_struct *vma,
>  			      struct vm_area_struct **prev, unsigned long start,
> -			      unsigned long end, unsigned long new_flags)
> +			      unsigned long end, unsigned long new_flags,
> +			      const char __user *new_anon_name)
>  {
>  	struct mm_struct *mm = vma->vm_mm;
>  	int error;
>  	pgoff_t pgoff;
>  
> -	if (new_flags == vma->vm_flags) {
> +	if (new_flags == vma->vm_flags && new_anon_name == vma_anon_name(vma)) {
>  		*prev = vma;
>  		return 0;
>  	}
> @@ -79,7 +80,7 @@ static int madvise_update_vma(struct vm_area_struct *vma,
>  	pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
>  	*prev = vma_merge(mm, *prev, start, end, new_flags, vma->anon_vma,
>  			  vma->vm_file, pgoff, vma_policy(vma),
> -			  vma->vm_userfaultfd_ctx);
> +			  vma->vm_userfaultfd_ctx, new_anon_name);
>  	if (*prev) {
>  		vma = *prev;
>  		goto success;
> @@ -112,10 +113,30 @@ static int madvise_update_vma(struct vm_area_struct *vma,
>  	 * vm_flags is protected by the mmap_lock held in write mode.
>  	 */
>  	vma->vm_flags = new_flags;
> +	if (!vma->vm_file)
> +		vma->shared.anon_name = new_anon_name;
>  
>  	return 0;
>  }
>  
> +static int madvise_vma_anon_name(struct vm_area_struct *vma,
> +				 struct vm_area_struct **prev,
> +				 unsigned long start, unsigned long end,
> +				 unsigned long name_addr)
> +{
> +	int error;
> +
> +	/* Only anonymous mappings can be named */
> +	if (vma->vm_file)
> +		return -EINVAL;
> +
> +	error = madvise_update_vma(vma, prev, start, end, vma->vm_flags,
> +				   (const char __user *)name_addr);
> +	if (error == -ENOMEM)
> +		error = -EAGAIN;
> +	return error;
> +}
> +
>  #ifdef CONFIG_SWAP
>  static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start,
>  	unsigned long end, struct mm_walk *walk)
> @@ -877,7 +898,8 @@ static int madvise_vma_behavior(struct vm_area_struct *vma,
>  		break;
>  	}
>  
> -	error = madvise_update_vma(vma, prev, start, end, new_flags);
> +	error = madvise_update_vma(vma, prev, start, end, new_flags,
> +				   vma_anon_name(vma));
>  
>  out:
>  	if (error == -ENOMEM)
> @@ -1059,6 +1081,30 @@ int madvise_walk_vmas(unsigned long start, unsigned long end,
>  	return unmapped_error;
>  }
>  
> +int madvise_set_anon_name(unsigned long start, unsigned long len_in,
> +			  unsigned long name_addr)
> +{
> +	unsigned long end;
> +	unsigned long len;
> +
> +	if (start & ~PAGE_MASK)
> +		return -EINVAL;
> +	len = (len_in + ~PAGE_MASK) & PAGE_MASK;
> +
> +	/* Check to see whether len was rounded up from small -ve to zero */
> +	if (len_in && !len)
> +		return -EINVAL;
> +
> +	end = start + len;
> +	if (end < start)
> +		return -EINVAL;
> +
> +	if (end == start)
> +		return 0;
> +
> +	return madvise_walk_vmas(start, end, name_addr, madvise_vma_anon_name);
> +}
> +
>  /*
>   * The madvise(2) system call.
>   *
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index eddbe4e56c73..94338d9bfe57 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -829,7 +829,8 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
>  			((vmstart - vma->vm_start) >> PAGE_SHIFT);
>  		prev = vma_merge(mm, prev, vmstart, vmend, vma->vm_flags,
>  				 vma->anon_vma, vma->vm_file, pgoff,
> -				 new_pol, vma->vm_userfaultfd_ctx);
> +				 new_pol, vma->vm_userfaultfd_ctx,
> +				 vma_anon_name(vma));
>  		if (prev) {
>  			vma = prev;
>  			next = vma->vm_next;
> diff --git a/mm/mlock.c b/mm/mlock.c
> index 93ca2bf30b4f..8e0046c4642f 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -534,7 +534,7 @@ static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
>  	pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
>  	*prev = vma_merge(mm, *prev, start, end, newflags, vma->anon_vma,
>  			  vma->vm_file, pgoff, vma_policy(vma),
> -			  vma->vm_userfaultfd_ctx);
> +			  vma->vm_userfaultfd_ctx, vma_anon_name(vma));
>  	if (*prev) {
>  		vma = *prev;
>  		goto success;
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 40248d84ad5f..8f3cd352a48f 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -987,7 +987,8 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
>   */
>  static inline int is_mergeable_vma(struct vm_area_struct *vma,
>  				struct file *file, unsigned long vm_flags,
> -				struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
> +				struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
> +				const char __user *anon_name)
>  {
>  	/*
>  	 * VM_SOFTDIRTY should not prevent from VMA merging, if we
> @@ -1005,6 +1006,8 @@ static inline int is_mergeable_vma(struct vm_area_struct *vma,
>  		return 0;
>  	if (!is_mergeable_vm_userfaultfd_ctx(vma, vm_userfaultfd_ctx))
>  		return 0;
> +	if (vma_anon_name(vma) != anon_name)
> +		return 0;
>  	return 1;
>  }
>  
> @@ -1037,9 +1040,10 @@ static int
>  can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags,
>  		     struct anon_vma *anon_vma, struct file *file,
>  		     pgoff_t vm_pgoff,
> -		     struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
> +		     struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
> +		     const char __user *anon_name)
>  {
> -	if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx) &&
> +	if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name) &&
>  	    is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
>  		if (vma->vm_pgoff == vm_pgoff)
>  			return 1;
> @@ -1058,9 +1062,10 @@ static int
>  can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
>  		    struct anon_vma *anon_vma, struct file *file,
>  		    pgoff_t vm_pgoff,
> -		    struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
> +		    struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
> +		     const char __user *anon_name)
>  {
> -	if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx) &&
> +	if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name) &&
>  	    is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
>  		pgoff_t vm_pglen;
>  		vm_pglen = vma_pages(vma);
> @@ -1071,9 +1076,9 @@ can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
>  }
>  
>  /*
> - * Given a mapping request (addr,end,vm_flags,file,pgoff), figure out
> - * whether that can be merged with its predecessor or its successor.
> - * Or both (it neatly fills a hole).
> + * Given a mapping request (addr,end,vm_flags,file,pgoff,anon_name),
> + * figure out whether that can be merged with its predecessor or its
> + * successor.  Or both (it neatly fills a hole).
>   *
>   * In most cases - when called for mmap, brk or mremap - [addr,end) is
>   * certain not to be mapped by the time vma_merge is called; but when
> @@ -1118,7 +1123,8 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
>  			unsigned long end, unsigned long vm_flags,
>  			struct anon_vma *anon_vma, struct file *file,
>  			pgoff_t pgoff, struct mempolicy *policy,
> -			struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
> +			struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
> +			const char __user *anon_name)
>  {
>  	pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
>  	struct vm_area_struct *area, *next;
> @@ -1151,7 +1157,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
>  			mpol_equal(vma_policy(prev), policy) &&
>  			can_vma_merge_after(prev, vm_flags,
>  					    anon_vma, file, pgoff,
> -					    vm_userfaultfd_ctx)) {
> +					    vm_userfaultfd_ctx, anon_name)) {
>  		/*
>  		 * OK, it can.  Can we now merge in the successor as well?
>  		 */
> @@ -1160,7 +1166,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
>  				can_vma_merge_before(next, vm_flags,
>  						     anon_vma, file,
>  						     pgoff+pglen,
> -						     vm_userfaultfd_ctx) &&
> +						     vm_userfaultfd_ctx, anon_name) &&
>  				is_mergeable_anon_vma(prev->anon_vma,
>  						      next->anon_vma, NULL)) {
>  							/* cases 1, 6 */
> @@ -1183,7 +1189,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
>  			mpol_equal(policy, vma_policy(next)) &&
>  			can_vma_merge_before(next, vm_flags,
>  					     anon_vma, file, pgoff+pglen,
> -					     vm_userfaultfd_ctx)) {
> +					     vm_userfaultfd_ctx, anon_name)) {
>  		if (prev && addr < prev->vm_end)	/* case 4 */
>  			err = __vma_adjust(prev, prev->vm_start,
>  					 addr, prev->vm_pgoff, NULL, next);
> @@ -1731,7 +1737,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
>  	 * Can we just expand an old mapping?
>  	 */
>  	vma = vma_merge(mm, prev, addr, addr + len, vm_flags,
> -			NULL, file, pgoff, NULL, NULL_VM_UFFD_CTX);
> +			NULL, file, pgoff, NULL, NULL_VM_UFFD_CTX, NULL);
>  	if (vma)
>  		goto out;
>  
> @@ -1779,7 +1785,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
>  		 */
>  		if (unlikely(vm_flags != vma->vm_flags && prev)) {
>  			merge = vma_merge(mm, prev, vma->vm_start, vma->vm_end, vma->vm_flags,
> -				NULL, vma->vm_file, vma->vm_pgoff, NULL, NULL_VM_UFFD_CTX);
> +				NULL, vma->vm_file, vma->vm_pgoff, NULL, NULL_VM_UFFD_CTX, NULL);
>  			if (merge) {
>  				fput(file);
>  				vm_area_free(vma);
> @@ -3063,7 +3069,7 @@ static int do_brk_flags(unsigned long addr, unsigned long len, unsigned long fla
>  
>  	/* Can we just expand an old private anonymous mapping? */
>  	vma = vma_merge(mm, prev, addr, addr + len, flags,
> -			NULL, NULL, pgoff, NULL, NULL_VM_UFFD_CTX);
> +			NULL, NULL, pgoff, NULL, NULL_VM_UFFD_CTX, NULL);
>  	if (vma)
>  		goto out;
>  
> @@ -3262,7 +3268,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
>  		return NULL;	/* should never get here */
>  	new_vma = vma_merge(mm, prev, addr, addr + len, vma->vm_flags,
>  			    vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
> -			    vma->vm_userfaultfd_ctx);
> +			    vma->vm_userfaultfd_ctx, vma_anon_name(vma));
>  	if (new_vma) {
>  		/*
>  		 * Source vma may have been merged into new_vma
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index ce8b8a5eacbb..d90c349a3fd9 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -454,7 +454,7 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
>  	pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
>  	*pprev = vma_merge(mm, *pprev, start, end, newflags,
>  			   vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
> -			   vma->vm_userfaultfd_ctx);
> +			   vma->vm_userfaultfd_ctx, vma_anon_name(vma));
>  	if (*pprev) {
>  		vma = *pprev;
>  		VM_WARN_ON((vma->vm_flags ^ newflags) & ~VM_SOFTDIRTY);
> -- 
> 2.28.0
> 

-- 
Michal Hocko
SUSE Labs




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux