On Tue, Oct 04, 2022 at 12:59:37PM +0200, Christian Brauner wrote: > On Mon, Oct 03, 2022 at 03:30:40PM +0300, Amir Goldstein wrote: > > Underlying fs doesn't remove privs because fallocate is called with > > privileged mounter credentials. > > > > This fixes some failure in fstests generic/683..687. > > > > Fixes: aab8848cee5e ("ovl: add ovl_fallocate()") > > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx> > > --- > > fs/overlayfs/file.c | 12 +++++++++++- > > 1 file changed, 11 insertions(+), 1 deletion(-) > > > > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c > > index c8308da8909a..e90ac5376456 100644 > > --- a/fs/overlayfs/file.c > > +++ b/fs/overlayfs/file.c > > @@ -517,9 +517,16 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len > > const struct cred *old_cred; > > int ret; > > > > + inode_lock(inode); > > + /* Update mode */ > > + ovl_copyattr(inode); > > + ret = file_remove_privs(file); > > First, thank you for picking this up! > > Let me analyze generic/683 failure of Test1 to see why you still see > failures in this test: > > echo "Test 1 - qa_user, non-exec file $verb" > setup_testfile > chmod a+rws $junk_file > commit_and_check "$qa_user" "$verb" 64k 64k > > So this creates a file with 6666 permissions. While the file has the > S_ISUID and S_ISGID bits set it does not have the S_IXGRP set. This is > important in a little bit. > > On a regular filesystem like xfs what will happen is: > > sys_fallocate() > -> vfs_fallocate() > -> xfs_file_fallocate() > -> file_modified() > -> __file_remove_privs() > -> dentry_needs_remove_privs() > -> should_remove_suid() > -> __remove_privs() > newattrs.ia_valid = ATTR_FORCE | kill; > -> notify_change() > > In should_remove_suid() we can see that ATTR_KILL_SUID is raised > unconditionally because the file in the test has S_ISUID set. > > But we also see that ATTR_KILL_SGID won't be set because while the file > is S_ISGID it is not S_IXGRP (see above) which is a condition for > ATTR_KILL_SGID being raised. > > So by the time we call notify_change() we have attr->ia_valid set to > ATTR_KILL_SUID | ATTR_FORCE. Now notify_change() sees that > ATTR_KILL_SUID is set and does: > > ia_valid = attr->ia_valid |= ATTR_MODE > attr->ia_mode = (inode->i_mode & ~S_ISUID); > > which means that when we call setattr_copy() later we will definitely > update inode->i_mode. Note that attr->ia_mode still contain S_ISGID. > > Now we call into the filesystem's ->setattr() inode operation which will end up > calling setattr_copy(). Since ATTR_MODE is set we will hit: > > if (ia_valid & ATTR_MODE) { > umode_t mode = attr->ia_mode; > vfsgid_t vfsgid = i_gid_into_vfsgid(mnt_userns, inode); > if (!vfsgid_in_group_p(vfsgid) && > !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID)) > mode &= ~S_ISGID; > inode->i_mode = mode; > } > > and since the caller in the test is neither capable nor in the group of the > inode the S_ISGID bit is stripped. > > But now contrast this with overlayfs even after your changes. When > ovl_setattr() is hit from ovl_fallocate()'s call to file_remove_privs() > and calls ovl_do_notify_change() then we are doing this under the > mounter's creds and so the S_ISGID bit is retained: > > sys_fallocate() > -> vfs_fallocate() > -> ovl_fallocate() > -> file_remove_privs() > -> dentry_needs_remove_privs() > -> should_remove_suid() > -> __remove_privs() > newattrs.ia_valid = attr_force | kill; > -> notify_change() > -> ovl_setattr() Note also that the setattr_prepare() call skips setgid checks because ATTR_FORCE is passed from file_remove_privs(). > // TAKE ON MOUNTER'S CREDS > -> ovl_do_notify_change() > // GIVE UP MOUNTER'S CREDS > // TAKE ON MOUNTER'S CREDS > -> vfs_fallocate() > -> xfs_file_fallocate() > -> file_modified() > -> __file_remove_privs() > -> dentry_needs_remove_privs() > -> should_remove_suid() > -> __remove_privs() > newattrs.ia_valid = attr_force | kill; > -> notify_change()