On Tue, 2023-12-05 14:21:51 -0800, Paul Moore wrote: > > On Fri, Dec 1, 2023 at 4:00 PM Munehisa Kamata <kamatam@xxxxxxxxxx> wrote: > > On Fri, 2023-12-01 09:30:00 +0000, Alexey Dobriyan wrote: > > > On Wed, Nov 29, 2023 at 05:11:22PM -0800, Andrew Morton wrote: > > > > > > > > fyi... > > > > > > > > (yuk!) > > > > > > > > Begin forwarded message: > > > > Date: Thu, 30 Nov 2023 00:37:04 +0000 > > > > From: Munehisa Kamata <kamatam@xxxxxxxxxx> > > > > Subject: [PATCH] proc: Update inode upon changing task security attribute > > > > > > > > I'm not clear whether VFS is a better (or worse) place[1] to fix the > > > > problem described below and would like to hear opinion. > > > > > > > > If the /proc/[pid] directory is bind-mounted on a system with Smack > > > > enabled, and if the task updates its current security attribute, the task > > > > may lose access to files in its own /proc/[pid] through the mountpoint. > > > > > > > > $ sudo capsh --drop=cap_mac_override -- > > > > # mkdir -p dir > > > > # mount --bind /proc/$$ dir > > > > # echo AAA > /proc/$$/task/current # assuming built-in echo > > > > # cat /proc/$$/task/current # revalidate > > > > AAA > > > > # echo BBB > dir/attr/current > > > > # cat dir/attr/current > > > > cat: dir/attr/current: Permission denied > > > > # ls dir/ > > > > ls: cannot access dir/: Permission denied > > > > # cat /proc/$$/attr/current # revalidate > > > > BBB > > > > # cat dir/attr/current > > > > BBB > > > > # echo CCC > /proc/$$/attr/current > > > > # cat dir/attr/current > > > > cat: dir/attr/current: Permission denied > > > > > > > > This happens because path lookup doesn't revalidate the dentry of the > > > > /proc/[pid] when traversing the filesystem boundary, so the inode security > > > > blob of the /proc/[pid] doesn't get updated with the new task security > > > > attribute. Then, this may lead security modules to deny an access to the > > > > directory. Looking at the code[2] and the /proc/pid/attr/current entry in > > > > proc man page, seems like the same could happen with SELinux. Though, I > > > > didn't find relevant reports. > > > > > > > > The steps above are quite artificial. I actually encountered such an > > > > unexpected denial of access with an in-house application sandbox > > > > framework; each app has its own dedicated filesystem tree where the > > > > process's /proc/[pid] is bind-mounted to and the app enters into via > > > > chroot. > > > > > > > > With this patch, writing to /proc/[pid]/attr/current (and its per-security > > > > module variant) updates the inode security blob of /proc/[pid] or > > > > /proc/[pid]/task/[tid] (when pid != tid) with the new attribute. > > > > > > > > [1] https://lkml.kernel.org/linux-fsdevel/4A2D15AF.8090000@xxxxxxx/ > > > > [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/security/selinux/hooks.c#n4220 > > > > > > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > > > > Signed-off-by: Munehisa Kamata <kamatam@xxxxxxxxxx> > > > > --- > > > > fs/proc/base.c | 23 ++++++++++++++++++++--- > > > > 1 file changed, 20 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/fs/proc/base.c b/fs/proc/base.c > > > > index dd31e3b6bf77..bdb7bea53475 100644 > > > > --- a/fs/proc/base.c > > > > +++ b/fs/proc/base.c > > > > @@ -2741,6 +2741,7 @@ static ssize_t proc_pid_attr_write(struct file * file, const char __user * buf, > > > > { > > > > struct inode * inode = file_inode(file); > > > > struct task_struct *task; > > > > + const char *name = file->f_path.dentry->d_name.name; > > > > void *page; > > > > int rv; > > > > > > > > @@ -2784,10 +2785,26 @@ static ssize_t proc_pid_attr_write(struct file * file, const char __user * buf, > > > > if (rv < 0) > > > > goto out_free; > > > > > > > > - rv = security_setprocattr(PROC_I(inode)->op.lsm, > > > > - file->f_path.dentry->d_name.name, page, > > > > - count); > > > > + rv = security_setprocattr(PROC_I(inode)->op.lsm, name, page, count); > > > > mutex_unlock(¤t->signal->cred_guard_mutex); > > > > + > > > > + /* > > > > + * Update the inode security blob in advance if the task's security > > > > + * attribute was updated > > > > + */ > > > > + if (rv > 0 && !strcmp(name, "current")) { > > > > + struct pid *pid; > > > > + struct proc_inode *cur, *ei; > > > > + > > > > + rcu_read_lock(); > > > > + pid = get_task_pid(current, PIDTYPE_PID); > > > > + hlist_for_each_entry(cur, &pid->inodes, sibling_inodes) > > > > + ei = cur; > > > > > > Should this "break;"? Why is only the last inode in the list updated? > > > Should it be the first? All of them? > > > > If it picks up the first node, it may end up updating /proc/[pid]/task/[tid] > > rather than /proc/[pid] (when pid == tid) and the task may be denied access > > to its own /proc/[pid] afterward. > > > > I think updating all of them won't hurt. But, as long as /proc/[pid] is > > accessible, the rest of the inodes should be updated upon path lookup via > > revalidation as usual. > > > > When pid != tid, it only updates /proc/[pid]/task/[tid] and the thread may > > lose an access to /proc/[pid], but I think it's okay as it's a matter of > > security policy enforced by security modules. Casey, do you have any > > comments here? > > > > > > + put_pid(pid); > > > > + pid_update_inode(current, &ei->vfs_inode); > > > > + rcu_read_unlock(); > > > > + } > > I think my thoughts are neatly summarized by Andrew's "yuk!" comment > at the top. However, before we go too much further on this, can we > get clarification that Casey was able to reproduce this on a stock > upstream kernel? Last I read in the other thread Casey wasn't seeing > this problem on Linux v6.5. > > However, for the moment I'm going to assume this is a real problem, is > there some reason why the existing pid_revalidate() code is not being > called in the bind mount case? From what I can see in the original > problem report, the path walk seems to work okay when the file is > accessed directly from /proc, but fails when done on the bind mount. > Is there some problem with revalidating dentrys on bind mounts? Hi Paul, https://lkml.kernel.org/linux-fsdevel/20090608201745.GO8633@xxxxxxxxxxxxxxxxxx/ After reading this thread, I have doubt about solving this in VFS. Honestly, however, I'm not sure if it's entirely relevant today. I think this behavior itself is not specific to bind mounts. Though, /proc/[pid] files are a bit special since its inode security blob has to be updated whenever the task's security attribute changed and it requries revalidation. I also considered .d_weak_revalidate, but it only helps when the final component of the path is the mountpoint on path lookup; a task will need to explicitly access the mountpoint once before accessing the files under the directory... I don't think it's a solution. Thanks, Munehisa > -- > paul-moore.com >