On 2/22/22 13:24, Christian Brauner wrote:
On Tue, Feb 22, 2022 at 09:33:40AM +0100, Christoph Hellwig wrote:
On Mon, Feb 21, 2022 at 09:22:18PM +0300, Andrey Zhadchenko wrote:
xfs_fileattr_set() handles idmapped mounts correctly and do not drop this
bits.
Unfortunately chown syscall results in different callstask:
i_op->xfs_vn_setattr()->...->xfs_setattr_nonsize() which checks if process
has CAP_FSETID capable in init_user_ns rather than mntns userns.
Can you add an xfstests the exercises this path?
The fix itself looks good:
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
So for anything other than directories the s{g,u}id bits are cleared on
every chown in notify_change() by the vfs; even for the root user (Also
documented on chown(2) manpage).
Only exception - chown preserves setgid bit set on a
non-group-executable file (also documented there) but do not take root
privileges into account at vfs level.
So the only scenario were this change would be relevant is for
directories afaict:
1. So ext4 has the behavior:
ubuntu@f2-vm|~
> mkdir suid.dir
ubuntu@f2-vm|~
> perms ./suid.dir
drwxrwxr-x 775 (1000:1000) ./suid.dir
ubuntu@f2-vm|~
> chmod u+s ./suid.dir/
ubuntu@f2-vm|~
> perms ./suid.dir
drwsrwxr-x 4775 (1000:1000) ./suid.dir
ubuntu@f2-vm|~
> chmod g+s ./suid.dir/
ubuntu@f2-vm|~
> perms ./suid.dir
drwsrwsr-x 6775 (1000:1000) ./suid.dir
ubuntu@f2-vm|~
> chown 1000:1000 ./suid.dir/
ubuntu@f2-vm|~
> perms ./suid.dir/
drwsrwsr-x 6775 (1000:1000) ./suid.dir/
meaning that both s{g,u}id bits are retained for directories. (Just to
make this explicit: changing {g,u}id to the same {g,u}id still ends up
calling into the filesystem.)
2. Whereas xfs currently has:
brauner@wittgenstein|~
> mkdir suid.dir
brauner@wittgenstein|~
> perms ./suid.dir
drwxrwxr-x 775 ./suid.dir
brauner@wittgenstein|~
> chmod u+s ./suid.dir/
brauner@wittgenstein|~
> perms ./suid.dir
drwsrwxr-x 4775 ./suid.dir
brauner@wittgenstein|~
> chmod g+s ./suid.dir/
brauner@wittgenstein|~
> perms ./suid.dir
drwsrwsr-x 6775 ./suid.dir
brauner@wittgenstein|~
> chown 1000:1000 ./suid.dir/
brauner@wittgenstein|~
> perms ./suid.dir/
drwxrwxr-x 775 ./suid.dir/
meaning that both s{g,u}id bits are cleared for directories.
Since the vfs will always ensure that s{g,u}id bits are stripped for
anything that isn't a directory in the vfs:
- ATTR_KILL_S{G,U}ID is raised in chown_common():
if (!S_ISDIR(inode->i_mode))
newattrs.ia_valid |=
ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;
- and then in notify_change() we'll get the bits stripped and ATTR_MODE
raised:
if (ia_valid & ATTR_KILL_SUID) {
if (mode & S_ISUID) {
ia_valid = attr->ia_valid |= ATTR_MODE;
attr->ia_mode = (inode->i_mode & ~S_ISUID);
}
}
if (ia_valid & ATTR_KILL_SGID) {
if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
So SGID is not killed if there is no S_IXGRP (yet no capability check)
Actually I do not really understand why do kernel expects filesystems to
further apply restrictions with CAP_FSETID. Why not kill it here since
we have all info?
if (!(ia_valid & ATTR_MODE)) {
ia_valid = attr->ia_valid |= ATTR_MODE;
attr->ia_mode = inode->i_mode;
}
attr->ia_mode &= ~S_ISGID;
}
}
we can change this codepath to just mirror setattr_copy() or switch
fully to setattr_copy() (if feasible).
Because as of right now the code seems to imply that the xfs code itself
is responsible for stripping s{g,u}id bits for all files whereas it is
the vfs that does it for any non-directory. So I'd propose to either try
and switch that code to setattr_copy() or to do open-code the
setattr_copy() check:
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index b79b3846e71b..ff55b31521a2 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -748,9 +748,13 @@ xfs_setattr_nonsize(
* The set-user-ID and set-group-ID bits of a file will be
* cleared upon successful return from chown()
*/
- if ((inode->i_mode & (S_ISUID|S_ISGID)) &&
- !capable(CAP_FSETID))
- inode->i_mode &= ~(S_ISUID|S_ISGID);
+ if (iattr->ia_valid & ATTR_MODE) {
+ umode_t mode = iattr->ia_mode;
+ if (!in_group_p(i_gid_into_mnt(mnt_userns, inode)) &&
+ !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
+ mode &= ~S_ISGID;
+ inode->i_mode = mode;
+ }
/*
* Change the ownerships and register quota modifications
which aligns xfs with ext4 and any other filesystem. Any thoughts on
this?
For @Andrey specifically: the tests these should go into:
https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/tree/src/idmapped-mounts/idmapped-mounts.c
Note that there are already setgid inheritance tests and set*id
execution tests in there.
You should be able to copy a lot of code from them. Could you please add
the test I sketched above and ideally also a test that the set{g,u}id
bits are stripped during chown for regular files?
Thanks for the link. To clarify what tests and result you expect:
- for directory chown we expect to preserve s{g,u}id
- for regfile chown we expect to preserve S_ISGID only if S_IXGRP is
absent and CAP_FSETID is present
JFYI: I found out this problem while running LTP (specifically
syscalls/chown02 test) on idmapped XFS. Maybe I will be able to find
more, who knows.