Someone ought to cc Ted since I asked him about this topic this morning and he said he hadn't noticed it going by... On Mon, May 22, 2023 at 12:50:19PM +0200, Christian Brauner wrote: > On Mon, May 22, 2023 at 08:49:50AM +1000, Dave Chinner wrote: > > On Sat, May 20, 2023 at 11:17:35AM +0200, Christian Brauner wrote: > > > On Fri, May 19, 2023 at 03:42:38PM -0400, Mimi Zohar wrote: > > > > On Fri, 2023-04-07 at 10:31 +0200, Christian Brauner wrote: > > > > > So, I think we want both; we want the ovl_copyattr() and the > > > > > vfs_getattr_nosec() change: > > > > > > > > > > (1) overlayfs should copy up the inode version in ovl_copyattr(). That > > > > > is in line what we do with all other inode attributes. IOW, the > > > > > overlayfs inode's i_version counter should aim to mirror the > > > > > relevant layer's i_version counter. I wouldn't know why that > > > > > shouldn't be the case. Asking the other way around there doesn't > > > > > seem to be any use for overlayfs inodes to have an i_version that > > > > > isn't just mirroring the relevant layer's i_version. > > > > > (2) Jeff's changes for ima to make it rely on vfs_getattr_nosec(). > > > > > Currently, ima assumes that it will get the correct i_version from > > > > > an inode but that just doesn't hold for stacking filesystem. > > > > > > > > > > While (1) would likely just fix the immediate bug (2) is correct and > > > > > _robust_. If we change how attributes are handled vfs_*() helpers will > > > > > get updated and ima with it. Poking at raw inodes without using > > > > > appropriate helpers is much more likely to get ima into trouble. > > > > > > > > In addition to properly setting the i_version for IMA, EVM has a > > > > similar issue with i_generation and s_uuid. Adding them to > > > > ovl_copyattr() seems to resolve it. Does that make sense? > > > > > > > > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c > > > > index 923d66d131c1..cd0aeb828868 100644 > > > > --- a/fs/overlayfs/util.c > > > > +++ b/fs/overlayfs/util.c > > > > @@ -1118,5 +1118,8 @@ void ovl_copyattr(struct inode *inode) > > > > inode->i_atime = realinode->i_atime; > > > > inode->i_mtime = realinode->i_mtime; > > > > inode->i_ctime = realinode->i_ctime; > > > > + inode->i_generation = realinode->i_generation; > > > > + if (inode->i_sb) > > > > + uuid_copy(&inode->i_sb->s_uuid, &realinode->i_sb- > > > > > > Overlayfs can consist of multiple lower layers and each of those lower > > > layers may have a different uuid. So everytime you trigger a > > > ovl_copyattr() on a different layer this patch would alter the uuid of > > > the overlayfs superblock. > > > > > > In addition the uuid should be set when the filesystem is mounted. > > > Unless the filesystem implements a dedicated ioctl() - like ext4 - to > > > change the uuid. > > > > IMO, that ext4 functionality is a landmine waiting to be stepped on. > > > > We should not be changing the sb->s_uuid of filesysetms dynamically. > > Yeah, I kinda agree. If it works for ext4 and it's an ext4 specific > ioctl then this is fine though. Now that Dave's brought up all kinds of questions about other parts of the kernel using s_uuid for things, I'm starting to think that even ext4 shouldn't be changing its own uuid on the fly. Unless, of course, someone writes a way to record when s_uuid has been accessed by something (e.g. pnfs, ima, etc) to lock out changes to the value... > Thanks for bringing this up. I had some thoughts on this (mostly at the > end of this mail) but haven't had the time to express them. > > > The VFS does not guarantee in any way that it is safe to change the > > sb->s_uuid (i.e. no locking, no change notifications, no udev > > events, etc). Various subsystems - both in the kernel and in > > userspace - use the sb->s_uuid as a canonical and/or persistent > > filesystem/device identifier and are unprepared to have it change > > while the filesystem is mounted and active. ...just like Dave just said. Heh. :( --D > Yes, it is not a VFS concept for sure. > > > > > I commented on this from an XFS perspective here when it was > > proposed to copy this ext4 mis-feature in XFS: > > > > https://lore.kernel.org/linux-xfs/20230314062847.GQ360264@xxxxxxxxxxxxxxxxxxx/ > > So I read the thread back then and I agree with you specifically about: > > * changing uuid dynamically isn't a well-defined concept > * hoisting the ext4 specific ioctl that allows changing the uuid > dynamically into a generic vfs ioctl is premature and gives the > impression that this is a well-defined concept when it isn't. > * the chosen data structure with a flexible array member would probably > work but is suboptimal > > > > > Further to this, I also suspect that changing uuids online will > > cause issues with userspace caching of fs uuids (e.g. libblkid and > > anything that uses it) and information that uses uuids to identify > > the filesystem that are set up at mount time (/dev/disk/by-uuid/ > > links, etc) by kernel events sent to userspace helpers... > > Yeah, that's a valid concern as it's common practice to put uuids into > /etc/fstab so if they were allowed to change while the filesystem is > mounted/superblock is active the minimum thing needed is for userspace > get a uevent so the /dev/disk/by-uuid/$uuid symlink can be updated by > udev. But I digress. > > > > > IMO, we shouldn't even be considering dynamic sb->s_uuid changes > > without first working through the full system impacts of having > > persistent userspace-visible filesystem identifiers change > > dynamically... > > Yes. > > --- > > The thing that I think we could do is have all filesystems that can > reasonably support it set a uuid. We currently don't do that. If we > would start doing that then all filesystems that currently don't > implement a separate f_fsid based on e.g., the disk's device number can > just generate the f_fsid based on the uuid. This will make all these > filesystems available to be used with fanotify - which requires f_fsid > to be set for its most useful features. > > This is often the most useful for filesystems such as tmpfs which gained > support for uuids quite recently. For such pseudo filesystems the > lifetime of the uuid would be the lifetime of the superblock in contrast > to filesystems like xfs that persist the uuid to disk. IOW, if you > mount -t tmpfs tmpfs /mnt; umount /mnt; mount -t tmpfs tmpfs /mnt then > you get a new uuid but the uuid stays fixed for the lifetime of the > superblock and can't be changed. > > So the patchset that you objected had one part that made sense to me > which was to hoist the ioctl that _gets_ the uuid from a filesystems > into a generic ioctl. But I agree that the structure wasn't chosen > nicely. I would prefer if this was a fixed size but extensible structure > which is a concept we've had for a long time. So say we were to chose > the following structure layout for the generic ioctl: > > struct fsuuid { > __u32 fsu_len; > __u32 fsu_flags; > __u8 fsu_uuid[16]; // 8 * 16 = 128 = 64 * 2 > }; > > then this would be compatible with ext4. It would also be extensible if > we wanted to add additional fields in the future or switch to a new uuid > format or whatever. > > A while back we did work for extensible struct in system calls but these > extensible structs also work with ioctls. > > For example, see what we did for kernel/seccomp.c: > > /* Extensible Argument ioctls */ > #define EA_IOCTL(cmd) ((cmd) & ~(IOC_INOUT | IOCSIZE_MASK)) > > switch (EA_IOCTL(cmd)) { > case EA_IOCTL(SECCOMP_IOCTL_NOTIF_ADDFD): > return seccomp_notify_addfd(filter, buf, _IOC_SIZE(cmd)); > default: > return -EINVAL; > } > > and then > > static long seccomp_notify_addfd(struct seccomp_filter *filter, > struct seccomp_notif_addfd __user *uaddfd, > unsigned int size) > { > [...] > > BUILD_BUG_ON(sizeof(addfd) < SECCOMP_NOTIFY_ADDFD_SIZE_VER0); > BUILD_BUG_ON(sizeof(addfd) != SECCOMP_NOTIFY_ADDFD_SIZE_LATEST); > > if (size < SECCOMP_NOTIFY_ADDFD_SIZE_VER0 || size >= PAGE_SIZE) > return -EINVAL; > > ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size); > if (ret) > return ret; > > [...] > } > > So the struct is versioned by size the same as for system calls. The > difference for the ioctl is that the size is already encoded in the > ioctl when it is defined. So even with a fixed size struct it is > trivially possible to extend the struct later as long as the extension > is 64bit aligned.