Re: [PATCH 2/2] ovl: remove privs in ovl_fallocate()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 4, 2022 at 1:59 PM Christian Brauner <brauner@xxxxxxxxxx> wrote:
>
> On Mon, Oct 03, 2022 at 03:30:40PM +0300, Amir Goldstein wrote:
> > Underlying fs doesn't remove privs because fallocate is called with
> > privileged mounter credentials.
> >
> > This fixes some failure in fstests generic/683..687.
> >
> > Fixes: aab8848cee5e ("ovl: add ovl_fallocate()")
> > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx>
> > ---
> >  fs/overlayfs/file.c | 12 +++++++++++-
> >  1 file changed, 11 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> > index c8308da8909a..e90ac5376456 100644
> > --- a/fs/overlayfs/file.c
> > +++ b/fs/overlayfs/file.c
> > @@ -517,9 +517,16 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
> >       const struct cred *old_cred;
> >       int ret;
> >
> > +     inode_lock(inode);
> > +     /* Update mode */
> > +     ovl_copyattr(inode);
> > +     ret = file_remove_privs(file);
>
> First, thank you for picking this up!
>
> Let me analyze generic/683 failure of Test1 to see why you still see
> failures in this test:
>
> echo "Test 1 - qa_user, non-exec file $verb"
> setup_testfile
> chmod a+rws $junk_file
> commit_and_check "$qa_user" "$verb" 64k 64k
>
> So this creates a file with 6666 permissions. While the file has the
> S_ISUID and S_ISGID bits set it does not have the S_IXGRP set. This is
> important in a little bit.
>
> On a regular filesystem like xfs what will happen is:
>
> sys_fallocate()
> -> vfs_fallocate()
>    -> xfs_file_fallocate()
>       -> file_modified()
>          -> __file_remove_privs()
>             -> dentry_needs_remove_privs()
>                -> should_remove_suid()
>             -> __remove_privs()
>                newattrs.ia_valid = ATTR_FORCE | kill;
>                -> notify_change()
>
> In should_remove_suid() we can see that ATTR_KILL_SUID is raised
> unconditionally because the file in the test has S_ISUID set.
>
> But we also see that ATTR_KILL_SGID won't be set because while the file
> is S_ISGID it is not S_IXGRP (see above) which is a condition for
> ATTR_KILL_SGID being raised.
>
> So by the time we call notify_change() we have attr->ia_valid set to
> ATTR_KILL_SUID | ATTR_FORCE. Now notify_change() sees that
> ATTR_KILL_SUID is set and does:
>
> ia_valid = attr->ia_valid |= ATTR_MODE
> attr->ia_mode = (inode->i_mode & ~S_ISUID);
>
> which means that when we call setattr_copy() later we will definitely
> update inode->i_mode. Note that attr->ia_mode still contain S_ISGID.
>
> Now we call into the filesystem's ->setattr() inode operation which will end up
> calling setattr_copy(). Since ATTR_MODE is set we will hit:
>
> if (ia_valid & ATTR_MODE) {
>         umode_t mode = attr->ia_mode;
>         vfsgid_t vfsgid = i_gid_into_vfsgid(mnt_userns, inode);
>         if (!vfsgid_in_group_p(vfsgid) &&
>             !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
>                 mode &= ~S_ISGID;
>         inode->i_mode = mode;
> }
>

Can you think of a reason why the above should not be done
in notify_change() before even calling to ->setattr()?

Although, it wouldn't help because ovl_setattr() does:

    if (attr->ia_valid & (ATTR_KILL_SUID|ATTR_KILL_SGID))
        attr->ia_valid &= ~ATTR_MODE;

> and since the caller in the test is neither capable nor in the group of the
> inode the S_ISGID bit is stripped.
>
> But now contrast this with overlayfs even after your changes. When
> ovl_setattr() is hit from ovl_fallocate()'s call to file_remove_privs()
> and calls ovl_do_notify_change() then we are doing this under the
> mounter's creds and so the S_ISGID bit is retained:
>
> sys_fallocate()
> -> vfs_fallocate()
>    -> ovl_fallocate()
>       -> file_remove_privs()
>          -> dentry_needs_remove_privs()
>             -> should_remove_suid()
>          -> __remove_privs()
>             newattrs.ia_valid = attr_force | kill;
>             -> notify_change()
>                -> ovl_setattr()
>                   // TAKE ON MOUNTER'S CREDS
>                   -> ovl_do_notify_change()
>                   // GIVE UP MOUNTER'S CREDS
>      // TAKE ON MOUNTER'S CREDS
>      -> vfs_fallocate()
>         -> xfs_file_fallocate()
>            -> file_modified()
>               -> __file_remove_privs()
>                  -> dentry_needs_remove_privs()
>                     -> should_remove_suid()
>                  -> __remove_privs()
>                     newattrs.ia_valid = attr_force | kill;
>                     -> notify_change()

The model in overlayfs is that security is checked twice
once on overlay inode with caller creds and once again
on xfs inode with mounter creds. Either of these checks
could result in clearing SUID/SGID bits.

In the call stack above, the outer should_remove_suid()
with caller creds sets ATTR_KILL_SUID and then the outer
notify_change() clears SUID and sets ATTR_MODE,
but ovl_setattr() clears ATTR_MODE and then the inner
notify_change() re-clears SUID and sets ATTR_MODE again.

If the outer notify_change() would have checked the in_group_p()
condition, clear SGID and set a flag ATTR_KILL_SGID_FORCE
then the inner notify_change() would see this flag and re-clear
SGID bit, just the same as it does with SUID bit in the stack stace
above.

Is this making any sense?

Thanks,
Amir.



[Index of Archives]     [Linux Filesystems Devel]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux