On Wed, Mar 3, 2021 at 4:04 PM Shakeel Butt <shakeelb@xxxxxxxxxx> wrote: > > On Wed, Mar 3, 2021 at 3:34 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > > > On Wed, Mar 3, 2021 at 3:17 PM Shakeel Butt <shakeelb@xxxxxxxxxx> wrote: > > > > > > On Wed, Mar 3, 2021 at 10:58 AM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > > > > > > > process_madvise currently requires ptrace attach capability. > > > > PTRACE_MODE_ATTACH gives one process complete control over another > > > > process. It effectively removes the security boundary between the > > > > two processes (in one direction). Granting ptrace attach capability > > > > even to a system process is considered dangerous since it creates an > > > > attack surface. This severely limits the usage of this API. > > > > The operations process_madvise can perform do not affect the correctness > > > > of the operation of the target process; they only affect where the data > > > > is physically located (and therefore, how fast it can be accessed). > > > > What we want is the ability for one process to influence another process > > > > in order to optimize performance across the entire system while leaving > > > > the security boundary intact. > > > > Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ > > > > and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata > > > > and CAP_SYS_NICE for influencing process performance. > > > > > > > > Cc: stable@xxxxxxxxxxxxxxx # 5.10+ > > > > Signed-off-by: Suren Baghdasaryan <surenb@xxxxxxxxxx> > > > > Reviewed-by: Kees Cook <keescook@xxxxxxxxxxxx> > > > > Acked-by: Minchan Kim <minchan@xxxxxxxxxx> > > > > Acked-by: David Rientjes <rientjes@xxxxxxxxxx> > > > > --- > > > > changes in v3 > > > > - Added Reviewed-by: Kees Cook <keescook@xxxxxxxxxxxx> > > > > - Created man page for process_madvise per Andrew's request: https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=a144f458bad476a3358e3a45023789cb7bb9f993 > > > > - cc'ed stable@xxxxxxxxxxxxxxx # 5.10+ per Andrew's request > > > > - cc'ed linux-security-module@xxxxxxxxxxxxxxx per James Morris's request > > > > > > > > mm/madvise.c | 13 ++++++++++++- > > > > 1 file changed, 12 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/mm/madvise.c b/mm/madvise.c > > > > index df692d2e35d4..01fef79ac761 100644 > > > > --- a/mm/madvise.c > > > > +++ b/mm/madvise.c > > > > @@ -1198,12 +1198,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, > > > > goto release_task; > > > > } > > > > > > > > - mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS); > > > > + /* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */ > > > > + mm = mm_access(task, PTRACE_MODE_READ_FSCREDS); > > > > if (IS_ERR_OR_NULL(mm)) { > > > > ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH; > > > > goto release_task; > > > > } > > > > > > > > + /* > > > > + * Require CAP_SYS_NICE for influencing process performance. Note that > > > > + * only non-destructive hints are currently supported. > > > > > > How is non-destructive defined? Is MADV_DONTNEED non-destructive? > > > > Non-destructive in this context means the data is not lost and can be > > recovered. I follow the logic described in > > https://lwn.net/Articles/794704/ where Minchan was introducing > > MADV_COLD and MADV_PAGEOUT as non-destructive versions of MADV_FREE > > and MADV_DONTNEED. Following that logic, MADV_FREE and MADV_DONTNEED > > would be considered destructive hints. > > Note that process_madvise_behavior_valid() allows only MADV_COLD and > > MADV_PAGEOUT at the moment, which are both non-destructive. > > > > There is a plan to support MADV_DONTNEED for this syscall. Do we need > to change these access checks again with that support? I think so. Destructive hints affect the data, so we will probably need stricter checks for those hints.