Re: [RFC PATCH] mm/mbind: Introduce process_mbind() syscall for external memory binding

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 22, 2023 at 03:31:05PM -0600, Vinicius Petrucci wrote:
> From: Vinicius Tavares Petrucci <vtavarespetr@xxxxxxxxxx>
> 
> The proposed API is as follows:
> 
> long process_mbind(int pidfd, 
>                 const struct iovec *iovec, 
>                 unsigned long vlen, 
>                 unsigned long mode, 
>                 const unsigned long *nmask,
>                 unsigned int flags);
> 
> The `pidfd` argument is used to select the process that is identified by the PID file 
> descriptor provided in pidfd. (See pidofd_open(2) for more information)
>

This is probably a more maintainable interface than the pid_t interface
in my RFC, but I have not used pidfd extensively myself.

I can pivot my RFC to utilize pidfd as a whole and pop this on top, if
the other interfaces are generally useful.

> Please note the initial `maxnode` parameter from `mbind` was omitted 
> to ensure the API doesn't exceed 6 arguments. Instead, the constant 
> MAX_NUMNODES was utilized.
> 

I don't think this will work, users have traditionally been allowed to
shorten their nodemasks, and also for some level of portability.

We may want to consider an arg structure, rather than just chopping an
argument off.

> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 10a590ee1c89..91ee300fa728 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1215,11 +1215,10 @@ static struct folio *alloc_migration_target_by_mpol(struct folio *src,
>  }
>  #endif
>  
> -static long do_mbind(unsigned long start, unsigned long len,
> +static long do_mbind(struct mm_struct *mm, unsigned long start, unsigned long len,
>  		     unsigned short mode, unsigned short mode_flags,
>  		     nodemask_t *nmask, unsigned long flags)
>  {
> -	struct mm_struct *mm = current->mm;
>  	struct vm_area_struct *vma, *prev;
>  	struct vma_iterator vmi;
>  	struct migration_mpol mmpol;
> @@ -1465,10 +1464,84 @@ static inline int sanitize_mpol_flags(int *mode, unsigned short *flags)
>  	return 0;
>  }

This is a completely insufficient change to do_mbind.  do_mbind utilizes
`current` in a variety of places for nodemask (cpuset) validation and to
acquire the task's lock.  This will not work the way you intend it to,
you end up mixing up node masks between current and target task.

see here:
https://lore.kernel.org/all/20231122211200.31620-7-gregory.price@xxxxxxxxxxxx/

I had to make do_mbind operate on the target task explicitly.

We may want to combine this change and with my change so that your iovec
changes can be re-used, because that is a very nice feature.

> +	unsigned long maxnode = MAX_NUMNODES;
> +	int err;
> +	nodemask_t nodes;
> +
> +	err = sanitize_mpol_flags(&lmode, &mode_flags);
> +	if (err)
> +		goto out;
> +
> +	err = get_nodes(&nodes, nmask, maxnode);

per above, userland MAX_NUMNODES may or may not be equal to kernel
MAX_NUMNODES, so i don't think you can just chop this argument off.


I think we can combine our RFCs here and get this sorted out.

~Gregory




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux