Re: Fw: [PATCH] proc: Update inode upon changing task security attribute

Munehisa Kamata <kamatam@xxxxxxxxxx> · Thu, 7 Dec 2023 18:14:33 -0800

On Tue, 2023-12-05 14:21:51 -0800, Paul Moore wrote:
>
> On Fri, Dec 1, 2023 at 4:00 PM Munehisa Kamata <kamatam@xxxxxxxxxx> wrote:
> > On Fri, 2023-12-01 09:30:00 +0000, Alexey Dobriyan wrote:
> > > On Wed, Nov 29, 2023 at 05:11:22PM -0800, Andrew Morton wrote:
> > > >
> > > > fyi...
> > > >
> > > > (yuk!)
> > > >
> > > > Begin forwarded message:
> > > > Date: Thu, 30 Nov 2023 00:37:04 +0000
> > > > From: Munehisa Kamata <kamatam@xxxxxxxxxx>
> > > > Subject: [PATCH] proc: Update inode upon changing task security attribute
> > > >
> > > > I'm not clear whether VFS is a better (or worse) place[1] to fix the
> > > > problem described below and would like to hear opinion.
> > > >
> > > > If the /proc/[pid] directory is bind-mounted on a system with Smack
> > > > enabled, and if the task updates its current security attribute, the task
> > > > may lose access to files in its own /proc/[pid] through the mountpoint.
> > > >
> > > >  $ sudo capsh --drop=cap_mac_override --
> > > >  # mkdir -p dir
> > > >  # mount --bind /proc/$$ dir
> > > >  # echo AAA > /proc/$$/task/current         # assuming built-in echo
> > > >  # cat /proc/$$/task/current                        # revalidate
> > > >  AAA
> > > >  # echo BBB > dir/attr/current
> > > >  # cat dir/attr/current
> > > >  cat: dir/attr/current: Permission denied
> > > >  # ls dir/
> > > >  ls: cannot access dir/: Permission denied
> > > >  # cat /proc/$$/attr/current                        # revalidate
> > > >  BBB
> > > >  # cat dir/attr/current
> > > >  BBB
> > > >  # echo CCC > /proc/$$/attr/current
> > > >  # cat dir/attr/current
> > > >  cat: dir/attr/current: Permission denied
> > > >
> > > > This happens because path lookup doesn't revalidate the dentry of the
> > > > /proc/[pid] when traversing the filesystem boundary, so the inode security
> > > > blob of the /proc/[pid] doesn't get updated with the new task security
> > > > attribute. Then, this may lead security modules to deny an access to the
> > > > directory. Looking at the code[2] and the /proc/pid/attr/current entry in
> > > > proc man page, seems like the same could happen with SELinux. Though, I
> > > > didn't find relevant reports.
> > > >
> > > > The steps above are quite artificial. I actually encountered such an
> > > > unexpected denial of access with an in-house application sandbox
> > > > framework; each app has its own dedicated filesystem tree where the
> > > > process's /proc/[pid] is bind-mounted to and the app enters into via
> > > > chroot.
> > > >
> > > > With this patch, writing to /proc/[pid]/attr/current (and its per-security
> > > > module variant) updates the inode security blob of /proc/[pid] or
> > > > /proc/[pid]/task/[tid] (when pid != tid) with the new attribute.
> > > >
> > > > [1] https://lkml.kernel.org/linux-fsdevel/4A2D15AF.8090000@xxxxxxx/
> > > > [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/security/selinux/hooks.c#n4220
> > > >
> > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > > > Signed-off-by: Munehisa Kamata <kamatam@xxxxxxxxxx>
> > > > ---
> > > >  fs/proc/base.c | 23 ++++++++++++++++++++---
> > > >  1 file changed, 20 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/fs/proc/base.c b/fs/proc/base.c
> > > > index dd31e3b6bf77..bdb7bea53475 100644
> > > > --- a/fs/proc/base.c
> > > > +++ b/fs/proc/base.c
> > > > @@ -2741,6 +2741,7 @@ static ssize_t proc_pid_attr_write(struct file * file, const char __user * buf,
> > > >  {
> > > >     struct inode * inode = file_inode(file);
> > > >     struct task_struct *task;
> > > > +   const char *name = file->f_path.dentry->d_name.name;
> > > >     void *page;
> > > >     int rv;
> > > >
> > > > @@ -2784,10 +2785,26 @@ static ssize_t proc_pid_attr_write(struct file * file, const char __user * buf,
> > > >     if (rv < 0)
> > > >             goto out_free;
> > > >
> > > > -   rv = security_setprocattr(PROC_I(inode)->op.lsm,
> > > > -                             file->f_path.dentry->d_name.name, page,
> > > > -                             count);
> > > > +   rv = security_setprocattr(PROC_I(inode)->op.lsm, name, page, count);
> > > >     mutex_unlock(&current->signal->cred_guard_mutex);
> > > > +
> > > > +   /*
> > > > +    *  Update the inode security blob in advance if the task's security
> > > > +    *  attribute was updated
> > > > +    */
> > > > +   if (rv > 0 && !strcmp(name, "current")) {
> > > > +           struct pid *pid;
> > > > +           struct proc_inode *cur, *ei;
> > > > +
> > > > +           rcu_read_lock();
> > > > +           pid = get_task_pid(current, PIDTYPE_PID);
> > > > +           hlist_for_each_entry(cur, &pid->inodes, sibling_inodes)
> > > > +                   ei = cur;
> > >
> > > Should this "break;"? Why is only the last inode in the list updated?
> > > Should it be the first? All of them?
> >
> > If it picks up the first node, it may end up updating /proc/[pid]/task/[tid]
> > rather than /proc/[pid] (when pid == tid) and the task may be denied access
> > to its own /proc/[pid] afterward.
> >
> > I think updating all of them won't hurt. But, as long as /proc/[pid] is
> > accessible, the rest of the inodes should be updated upon path lookup via
> > revalidation as usual.
> >
> > When pid != tid, it only updates /proc/[pid]/task/[tid] and the thread may
> > lose an access to /proc/[pid], but I think it's okay as it's a matter of
> > security policy enforced by security modules. Casey, do you have any
> > comments here?
> >
> > > > +           put_pid(pid);
> > > > +           pid_update_inode(current, &ei->vfs_inode);
> > > > +           rcu_read_unlock();
> > > > +   }
> 
> I think my thoughts are neatly summarized by Andrew's "yuk!" comment
> at the top.  However, before we go too much further on this, can we
> get clarification that Casey was able to reproduce this on a stock
> upstream kernel?  Last I read in the other thread Casey wasn't seeing
> this problem on Linux v6.5.
> 
> However, for the moment I'm going to assume this is a real problem, is
> there some reason why the existing pid_revalidate() code is not being
> called in the bind mount case?  From what I can see in the original
> problem report, the path walk seems to work okay when the file is
> accessed directly from /proc, but fails when done on the bind mount.
> Is there some problem with revalidating dentrys on bind mounts?

Hi Paul,

https://lkml.kernel.org/linux-fsdevel/20090608201745.GO8633@xxxxxxxxxxxxxxxxxx/

After reading this thread, I have doubt about solving this in VFS.
Honestly, however, I'm not sure if it's entirely relevant today.

I think this behavior itself is not specific to bind mounts. Though,
/proc/[pid] files are a bit special since its inode security blob has to be
updated whenever the task's security attribute changed and it requries
revalidation.

I also considered .d_weak_revalidate, but it only helps when the final
component of the path is the mountpoint on path lookup; a task will need to
explicitly access the mountpoint once before accessing the files under the
directory... I don't think it's a solution.

Thanks,
Munehisa

> -- 
> paul-moore.com
>