On Thu, Jan 16, 2020 at 11:45:18PM +0100, Christian Brauner wrote: > Commit 69f594a38967 ("ptrace: do not audit capability check when outputing /proc/pid/stat") > introduced the ability to opt out of audit messages for accesses to > various proc files since they are not violations of policy. > While doing so it somehow switched the check from ns_capable() to > has_ns_capability{_noaudit}(). That means it switched from checking the > subjective credentials of the task to using the objective credentials. I > couldn't find the original lkml thread and so I don't know why this switch > was done. But it seems wrong since ptrace_has_cap() is currently only used > in ptrace_may_access(). And it's used to check whether the calling task > (subject) has the CAP_SYS_PTRACE capability in the provided user namespace > to operate on the target task (object). According to the cred.h comments > this would mean the subjective credentials of the calling task need to be > used. I don't follow this description. As far as I can see, both the current code and your patch end up using current's cred, yes? I'm not following the subjective/objective change mentioned here. Before: bool has_ns_capability(struct task_struct *t, struct user_namespace *ns, int cap) { int ret; rcu_read_lock(); ret = security_capable(__task_cred(t), ns, cap, CAP_OPT_NONE); rcu_read_unlock(); return (ret == 0); } ... return has_ns_capability(current, ns, CAP_SYS_PTRACE) After: const struct cred *cred = current_cred(), ... ... return security_capable(cred, ns, CAP_SYS_PTRACE, CAP_OPT_NOAUDIT); The cred passed to security_capable() is the subject before and after. > This switches it to use security_capable() because we only call > ptrace_has_cap() in ptrace_may_access() and in there we already have a > stable reference to the calling tasks creds under cred_guard_mutex so > there's no need to go through another series of dereferences and rcu > locking done in ns_capable{_noaudit}(). This makes sense to me -- now there's no possible race on the cred changing between the two ptrace_has_cap() checks, yes? However, I'm still trying to see where cred_guard_mutex() comes into play for callers of ptrace_may_access(). I see it for the object ("task" arg in ptrace_may_access()), but if this is dealing with the cred on current, it's just the RCU read lock protecting it (which I think is fine here), but seems confusing in the commit log. > As one example where this might be particularly problematic, Jann pointed > out that in combination with the upcoming IORING_OP_OPENAT feature, this > bug might allow unprivileged users to bypass the capability checks while > asynchronously opening files like /proc/*/mem, because the capability > checks for this would be performed against kernel credentials. As in, winning a race between the two ptrace_has_cap() calls across a cred transition? > Cc: Oleg Nesterov <oleg@xxxxxxxxxx> > Cc: Eric Paris <eparis@xxxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx > Reviewed-by: Serge Hallyn <serge@xxxxxxxxxx> > Reviewed-by: Jann Horn <jannh@xxxxxxxxxx> > Fixes: 69f594a38967 ("ptrace: do not audit capability check when outputing /proc/pid/stat") > Signed-off-by: Christian Brauner <christian.brauner@xxxxxxxxxx> > --- > kernel/ptrace.c | 11 ++++++----- > 1 file changed, 6 insertions(+), 5 deletions(-) > > diff --git a/kernel/ptrace.c b/kernel/ptrace.c > index cb9ddcc08119..d146133e97f1 100644 > --- a/kernel/ptrace.c > +++ b/kernel/ptrace.c > @@ -264,12 +264,13 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state) > return ret; > } > > -static int ptrace_has_cap(struct user_namespace *ns, unsigned int mode) > +static int ptrace_has_cap(const struct cred *cred, struct user_namespace *ns, > + unsigned int mode) > { > if (mode & PTRACE_MODE_NOAUDIT) > - return has_ns_capability_noaudit(current, ns, CAP_SYS_PTRACE); > + return security_capable(cred, ns, CAP_SYS_PTRACE, CAP_OPT_NOAUDIT); > else > - return has_ns_capability(current, ns, CAP_SYS_PTRACE); > + return security_capable(cred, ns, CAP_SYS_PTRACE, CAP_OPT_NONE); > } Style nit -- can we just make this a single invocation of security_capable(), something like: return security_capable(cred, ns, CAP_SYS_PTRACE, mode & PTRACE_MODE_NOAUDIT ? CAP_OPT_NOAUDIT, : CAP_OPT_NONE) == 0; Obviously not required, but the longer if hurts my eyes. ;) > > /* Returns 0 on success, -errno on denial. */ > @@ -321,7 +322,7 @@ static int __ptrace_may_access(struct task_struct *task, unsigned int mode) > gid_eq(caller_gid, tcred->sgid) && > gid_eq(caller_gid, tcred->gid)) > goto ok; > - if (ptrace_has_cap(tcred->user_ns, mode)) > + if (ptrace_has_cap(cred, tcred->user_ns, mode)) > goto ok; > rcu_read_unlock(); > return -EPERM; > @@ -340,7 +341,7 @@ static int __ptrace_may_access(struct task_struct *task, unsigned int mode) > mm = task->mm; > if (mm && > ((get_dumpable(mm) != SUID_DUMP_USER) && > - !ptrace_has_cap(mm->user_ns, mode))) > + !ptrace_has_cap(cred, mm->user_ns, mode))) > return -EPERM; > > return security_ptrace_access_check(task, mode); > > base-commit: b3a987b0264d3ddbb24293ebff10eddfc472f653 > -- > 2.25.0 > So, I think this change looks correct, but I find the commit subject and log confusing (perhaps because I am dense) and misleading (again, perhaps because I am dense). -- Kees Cook