On 12/5/2023 2:21 PM, Paul Moore wrote: > On Fri, Dec 1, 2023 at 4:00 PM Munehisa Kamata <kamatam@xxxxxxxxxx> wrote: >> On Fri, 2023-12-01 09:30:00 +0000, Alexey Dobriyan wrote: >>> On Wed, Nov 29, 2023 at 05:11:22PM -0800, Andrew Morton wrote: >>>> fyi... >>>> >>>> (yuk!) >>>> >>>> Begin forwarded message: >>>> Date: Thu, 30 Nov 2023 00:37:04 +0000 >>>> From: Munehisa Kamata <kamatam@xxxxxxxxxx> >>>> Subject: [PATCH] proc: Update inode upon changing task security attribute >>>> >>>> I'm not clear whether VFS is a better (or worse) place[1] to fix the >>>> problem described below and would like to hear opinion. >>>> >>>> If the /proc/[pid] directory is bind-mounted on a system with Smack >>>> enabled, and if the task updates its current security attribute, the task >>>> may lose access to files in its own /proc/[pid] through the mountpoint. >>>> >>>> $ sudo capsh --drop=cap_mac_override -- >>>> # mkdir -p dir >>>> # mount --bind /proc/$$ dir >>>> # echo AAA > /proc/$$/task/current # assuming built-in echo >>>> # cat /proc/$$/task/current # revalidate >>>> AAA >>>> # echo BBB > dir/attr/current >>>> # cat dir/attr/current >>>> cat: dir/attr/current: Permission denied >>>> # ls dir/ >>>> ls: cannot access dir/: Permission denied >>>> # cat /proc/$$/attr/current # revalidate >>>> BBB >>>> # cat dir/attr/current >>>> BBB >>>> # echo CCC > /proc/$$/attr/current >>>> # cat dir/attr/current >>>> cat: dir/attr/current: Permission denied >>>> >>>> This happens because path lookup doesn't revalidate the dentry of the >>>> /proc/[pid] when traversing the filesystem boundary, so the inode security >>>> blob of the /proc/[pid] doesn't get updated with the new task security >>>> attribute. Then, this may lead security modules to deny an access to the >>>> directory. Looking at the code[2] and the /proc/pid/attr/current entry in >>>> proc man page, seems like the same could happen with SELinux. Though, I >>>> didn't find relevant reports. >>>> >>>> The steps above are quite artificial. I actually encountered such an >>>> unexpected denial of access with an in-house application sandbox >>>> framework; each app has its own dedicated filesystem tree where the >>>> process's /proc/[pid] is bind-mounted to and the app enters into via >>>> chroot. >>>> >>>> With this patch, writing to /proc/[pid]/attr/current (and its per-security >>>> module variant) updates the inode security blob of /proc/[pid] or >>>> /proc/[pid]/task/[tid] (when pid != tid) with the new attribute. >>>> >>>> [1] https://lkml.kernel.org/linux-fsdevel/4A2D15AF.8090000@xxxxxxx/ >>>> [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/security/selinux/hooks.c#n4220 >>>> >>>> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") >>>> Signed-off-by: Munehisa Kamata <kamatam@xxxxxxxxxx> >>>> --- >>>> fs/proc/base.c | 23 ++++++++++++++++++++--- >>>> 1 file changed, 20 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/fs/proc/base.c b/fs/proc/base.c >>>> index dd31e3b6bf77..bdb7bea53475 100644 >>>> --- a/fs/proc/base.c >>>> +++ b/fs/proc/base.c >>>> @@ -2741,6 +2741,7 @@ static ssize_t proc_pid_attr_write(struct file * file, const char __user * buf, >>>> { >>>> struct inode * inode = file_inode(file); >>>> struct task_struct *task; >>>> + const char *name = file->f_path.dentry->d_name.name; >>>> void *page; >>>> int rv; >>>> >>>> @@ -2784,10 +2785,26 @@ static ssize_t proc_pid_attr_write(struct file * file, const char __user * buf, >>>> if (rv < 0) >>>> goto out_free; >>>> >>>> - rv = security_setprocattr(PROC_I(inode)->op.lsm, >>>> - file->f_path.dentry->d_name.name, page, >>>> - count); >>>> + rv = security_setprocattr(PROC_I(inode)->op.lsm, name, page, count); >>>> mutex_unlock(¤t->signal->cred_guard_mutex); >>>> + >>>> + /* >>>> + * Update the inode security blob in advance if the task's security >>>> + * attribute was updated >>>> + */ >>>> + if (rv > 0 && !strcmp(name, "current")) { >>>> + struct pid *pid; >>>> + struct proc_inode *cur, *ei; >>>> + >>>> + rcu_read_lock(); >>>> + pid = get_task_pid(current, PIDTYPE_PID); >>>> + hlist_for_each_entry(cur, &pid->inodes, sibling_inodes) >>>> + ei = cur; >>> Should this "break;"? Why is only the last inode in the list updated? >>> Should it be the first? All of them? >> If it picks up the first node, it may end up updating /proc/[pid]/task/[tid] >> rather than /proc/[pid] (when pid == tid) and the task may be denied access >> to its own /proc/[pid] afterward. >> >> I think updating all of them won't hurt. But, as long as /proc/[pid] is >> accessible, the rest of the inodes should be updated upon path lookup via >> revalidation as usual. >> >> When pid != tid, it only updates /proc/[pid]/task/[tid] and the thread may >> lose an access to /proc/[pid], but I think it's okay as it's a matter of >> security policy enforced by security modules. Casey, do you have any >> comments here? >> >>>> + put_pid(pid); >>>> + pid_update_inode(current, &ei->vfs_inode); >>>> + rcu_read_unlock(); >>>> + } > I think my thoughts are neatly summarized by Andrew's "yuk!" comment > at the top. However, before we go too much further on this, can we > get clarification that Casey was able to reproduce this on a stock > upstream kernel? Last I read in the other thread Casey wasn't seeing > this problem on Linux v6.5. I was able to recreate the problem once given corrected instructions, which have to be followed exactly. > However, for the moment I'm going to assume this is a real problem, is > there some reason why the existing pid_revalidate() code is not being > called in the bind mount case? From what I can see in the original > problem report, the path walk seems to work okay when the file is > accessed directly from /proc, but fails when done on the bind mount. > Is there some problem with revalidating dentrys on bind mounts? >