On Fri, May 15, 2020 at 6:21 PM Minchan Kim <minchan@xxxxxxxxxx> wrote: > > Based on discussion[1], people didn't feel we need to support both > pid and pidfd for every new coming API[2] so this patch keeps only > pidfd. This patch also changes flags's type with "unsigned int". > So finally, the API is as follows, > > ssize_t process_madvise(int pidfd, const struct iovec *iovec, > unsigned long vlen, int advice, unsigned int flags); > > DESCRIPTION > The process_madvise() system call is used to give advice or directions > to the kernel about the address ranges from external process as well as > local process. It provides the advice to address ranges of process > described by iovec and vlen. The goal of such advice is to improve system > or application performance. > > The pidfd selects the process referred to by the PID file descriptor > specified in pidfd. (See pidofd_open(2) for further information) > > The pointer iovec points to an array of iovec structures, defined in > <sys/uio.h> as: > > struct iovec { > void *iov_base; /* starting address */ > size_t iov_len; /* number of bytes to be advised */ > }; > > The iovec describes address ranges beginning at address(iov_base) > and with size length of bytes(iov_len). > > The vlen represents the number of elements in iovec. > > The advice is indicated in the advice argument, which is one of the > following at this moment if the target process specified by idtype and There is no idtype parameter anymore, so maybe just "if the target process is external"? > id is external. > > MADV_COLD > MADV_PAGEOUT > MADV_MERGEABLE > MADV_UNMERGEABLE > > Permission to provide a hint to external process is governed by a > ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2). > > The process_madvise supports every advice madvise(2) has if target > process is in same thread group with calling process so user could > use process_madvise(2) to extend existing madvise(2) to support > vector address ranges. > > RETURN VALUE > On success, process_madvise() returns the number of bytes advised. > This return value may be less than the total number of requested > bytes, if an error occurred. The caller should check return value > to determine whether a partial advice occurred. > > [1] https://lore.kernel.org/linux-mm/20200509124817.xmrvsrq3mla6b76k@wittgenstein/ > [2] https://lore.kernel.org/linux-mm/9d849087-3359-c4ab-fbec-859e8186c509@xxxxxxxxxxxxx/ > Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx> > --- > mm/madvise.c | 42 +++++++++++++----------------------------- > 1 file changed, 13 insertions(+), 29 deletions(-) > > diff --git a/mm/madvise.c b/mm/madvise.c > index d3fbbe52d230..35c9b220146a 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -1229,8 +1229,8 @@ static int process_madvise_vec(struct task_struct *target_task, > return ret; > } > > -static ssize_t do_process_madvise(int which, pid_t upid, struct iov_iter *iter, > - int behavior, unsigned long flags) > +static ssize_t do_process_madvise(int pidfd, struct iov_iter *iter, > + int behavior, unsigned int flags) > { > ssize_t ret; > struct pid *pid; > @@ -1241,26 +1241,12 @@ static ssize_t do_process_madvise(int which, pid_t upid, struct iov_iter *iter, > if (flags != 0) > return -EINVAL; > > - switch (which) { > - case P_PID: > - if (upid <= 0) > - return -EINVAL; > - > - pid = find_get_pid(upid); > - if (!pid) > - return -ESRCH; > - break; > - case P_PIDFD: > - if (upid < 0) > - return -EINVAL; > - > - pid = pidfd_get_pid(upid); > - if (IS_ERR(pid)) > - return PTR_ERR(pid); > - break; > - default: > + if (pidfd < 0) > return -EINVAL; > - } > + > + pid = pidfd_get_pid(pidfd); > + if (IS_ERR(pid)) > + return PTR_ERR(pid); > > task = get_pid_task(pid, PIDTYPE_PID); > if (!task) { > @@ -1292,9 +1278,8 @@ static ssize_t do_process_madvise(int which, pid_t upid, struct iov_iter *iter, > return ret; > } > > -SYSCALL_DEFINE6(process_madvise, int, which, pid_t, upid, > - const struct iovec __user *, vec, unsigned long, vlen, > - int, behavior, unsigned long, flags) > +SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, > + unsigned long, vlen, int, behavior, unsigned int, flags) > { > ssize_t ret; > struct iovec iovstack[UIO_FASTIOV]; > @@ -1303,19 +1288,18 @@ SYSCALL_DEFINE6(process_madvise, int, which, pid_t, upid, > > ret = import_iovec(READ, vec, vlen, ARRAY_SIZE(iovstack), &iov, &iter); > if (ret >= 0) { > - ret = do_process_madvise(which, upid, &iter, behavior, flags); > + ret = do_process_madvise(pidfd, &iter, behavior, flags); > kfree(iov); > } > return ret; > } > > #ifdef CONFIG_COMPAT > -COMPAT_SYSCALL_DEFINE6(process_madvise, compat_int_t, which, > - compat_pid_t, upid, > +COMPAT_SYSCALL_DEFINE5(process_madvise, compat_int_t, pidfd, > const struct compat_iovec __user *, vec, > compat_ulong_t, vlen, > compat_int_t, behavior, > - compat_ulong_t, flags) > + compat_int_t, flags) > > { > ssize_t ret; > @@ -1326,7 +1310,7 @@ COMPAT_SYSCALL_DEFINE6(process_madvise, compat_int_t, which, > ret = compat_import_iovec(READ, vec, vlen, ARRAY_SIZE(iovstack), > &iov, &iter); > if (ret >= 0) { > - ret = do_process_madvise(which, upid, &iter, behavior, flags); > + ret = do_process_madvise(pidfd, &iter, behavior, flags); > kfree(iov); > } > return ret; > -- > 2.26.2.761.g0e0b3e54be-goog > Reviewed-by: Suren Baghdasaryan <surenb@xxxxxxxxxx>