On Tue, Aug 01, 2023 at 10:25:21AM -0400, Josef Bacik wrote: > On Tue, Aug 01, 2023 at 03:08:59PM +0200, Christian Brauner wrote: > > Summary > > ======= > > > > This introduces FSCONFIG_CMD_CREATE_EXCL which will allows userspace to > > implement something like mount -t ext4 --exclusive /dev/sda /B which > > fails if a superblock for the requested filesystem does already exist: > > > > Before this patch > > ----------------- > > > > $ sudo ./move-mount -f xfs -o source=/dev/sda4 /A > > Requesting filesystem type xfs > > Mount options requested: source=/dev/sda4 > > Attaching mount at /A > > Moving single attached mount > > Setting key(source) with val(/dev/sda4) > > > > $ sudo ./move-mount -f xfs -o source=/dev/sda4 /B > > Requesting filesystem type xfs > > Mount options requested: source=/dev/sda4 > > Attaching mount at /B > > Moving single attached mount > > Setting key(source) with val(/dev/sda4) > > > > After this patch with --exclusive as a switch for FSCONFIG_CMD_CREATE_EXCL > > -------------------------------------------------------------------------- > > > > $ sudo ./move-mount -f xfs --exclusive -o source=/dev/sda4 /A > > Requesting filesystem type xfs > > Request exclusive superblock creation > > Mount options requested: source=/dev/sda4 > > Attaching mount at /A > > Moving single attached mount > > Setting key(source) with val(/dev/sda4) > > > > $ sudo ./move-mount -f xfs --exclusive -o source=/dev/sda4 /B > > Requesting filesystem type xfs > > Request exclusive superblock creation > > Mount options requested: source=/dev/sda4 > > Attaching mount at /B > > Moving single attached mount > > Setting key(source) with val(/dev/sda4) > > Device or resource busy | move-mount.c: 300: do_fsconfig: i xfs: reusing existing superblock not allowed > > > > Details > > ======= > > > > As mentioned on the list (cf. [1]-[3]) mount requests like > > mount -t ext4 /dev/sda /A are ambigous for userspace. Either a new > > superblock has been created and mounted or an existing superblock has > > been reused and a bind-mount has been created. > > > > This becomes clear if we consider two processes creating the same mount > > for the same block device: > > > > P1 P2 > > fd_fs = fsopen("ext4"); fd_fs = fsopen("ext4"); > > fsconfig(fd_fs, FSCONFIG_SET_STRING, "source", "/dev/sda"); fsconfig(fd_fs, FSCONFIG_SET_STRING, "source", "/dev/sda"); > > fsconfig(fd_fs, FSCONFIG_SET_STRING, "dax", "always"); fsconfig(fd_fs, FSCONFIG_SET_STRING, "resuid", "1000"); > > > > // wins and creates superblock > > fsconfig(fd_fs, FSCONFIG_CMD_CREATE, ...) > > // finds compatible superblock of P1 > > // spins until P1 sets SB_BORN and grabs a reference > > fsconfig(fd_fs, FSCONFIG_CMD_CREATE, ...) > > > > fd_mnt1 = fsmount(fd_fs); fd_mnt2 = fsmount(fd_fs); > > move_mount(fd_mnt1, "/A") move_mount(fd_mnt2, "/B") > > > > Not just does P2 get a bind-mount for nearly all filesystems the mount > > options for P2 are usually completely ignored. The VFS itself doesn't > > and shouldn't enforce filesystem specific mount option compatibility. It > > only enforces incompatibility for read-only <-> read-write transitions: > > > > mount -t ext4 /dev/sda /A > > mount -t ext4 -o ro /dev/sda /B > > > > The read-only request will fail with EBUSY as the VFS can't just > > silently transition a superblock from read-write to read-only or vica > > versa without risking security issues. > > > > To userspace this silent superblock reuse can be security issue in > > certain circumstances because there is currently no simple way for them > > to know that they did indeed manage to create the superblock and didn't > > just reuse an existing one. > > > > This adds a new FSCONFIG_CMD_CREATE_EXCL command to fsconfig() that > > returns EBUSY if an existing superblock is found. Userspace that needs > > to be sure that they did create the superblock with the requested mount > > options can request superblock creation using this command. If it > > succeeds they can be sure that they did create the superblock with the > > requested mount options. > > > > This requires the new mount api. With the old mount api we would have to > > plumb this through every legacy filesystem's file_system_type->mount() > > method. If they want this feature they are most welcome to switch to the > > new mount api. > > > > Following is an analysis of the effect of FSCONFIG_CMD_CREATE_EXCL on > > each high-level superblock creation helper: > > > > (1) get_tree_nodev() > > > > Always allocate new superblock. Hence, FSCONFIG_CMD_CREATE and > > FSCONFIG_CMD_CREATE_EXCL are equivalent. > > > > The binderfs or overlayfs filesystems are examples. > > > > (4) get_tree_keyed() > > > > Finds an existing superblock based on sb->s_fs_info. Hence, > > FSCONFIG_CMD_CREATE would reuse an existing superblock whereas > > FSCONFIG_CMD_CREATE_EXCL would reject it with EBUSY. > > > > The mqueue or nfsd filesystems are examples. > > > > (2) get_tree_bdev() > > > > This effectively works like get_tree_keyed(). > > > > The ext4 or xfs filesystems are examples. > > > > (3) get_tree_single() > > > > Only one superblock of this filesystem type can ever exist. > > Hence, FSCONFIG_CMD_CREATE would reuse an existing superblock > > whereas FSCONFIG_CMD_CREATE_EXCL would reject it with EBUSY. > > > > The securityfs or configfs filesystems are examples. > > > > This has a further consequence. Some filesystems will never destroy > > the superblock once it has been created. For example, if securityfs > > is mounted the allocated superblock will never be destroyed again as > > long as there is still an LSM making use it. Consequently, even if > > securityfs is unmounted and seemingly destroyed it really isn't > > which means that FSCONFIG_CMD_CREATE_EXCL will continue rejecting > > reusing the existing superblock. > > > > This is unintuitive but not a problem. Such special purpose > > filesystems aren't mounted multiple times anyway. > > > > Following is an analysis of the effect of FSCONFIG_CMD_CREATE_EXCL on > > filesystems that make use of the low-level sget_fc() helper directly. > > They're all effectively variants on get_tree_keyed() or get_tree_bdev(): > > > > (5) mtd_get_sb() > > > > Similar logic to get_tree_keyed(). > > > > (6) afs_get_tree() > > > > Similar logic to get_tree_keyed(). > > > > (7) ceph_get_tree() > > > > Similar logic to get_tree_keyed(). > > > > Already explicitly allows forcing the allocation of a new superblock > > via CEPH_OPT_NOSHARE. This turns it into get_tree_nodev(). > > > > (8) fuse_get_tree_submount() > > > > Similar logic to get_tree_nodev(). > > > > (9) fuse_get_tree() > > > > Forces reuse of existing FUSE superblock. > > > > Forces reuse of existing superblock if passed in file refers to an > > existing FUSE connection. > > If FSCONFIG_CMD_CREATE_EXCL is specified together with an fd > > referring to an existing FUSE connections this would cause the > > superblock reusal to fail. If reusing is the intent then > > FSCONFIG_CMD_CREATE_EXCL shouldn't be specified. > > > > (10) fuse_get_tree() > > -> get_tree_nodev() > > > > Same logic as in get_tree_nodev(). > > > > (11) fuse_get_tree() > > -> get_tree_bdev() > > > > Same logic as in get_tree_bdev(). > > > > (12) virtio_fs_get_tree() > > > > Same logic as get_tree_keyed(). > > > > (13) gfs2_meta_get_tree() > > > > Forces reuse of existing gfs2 superblock. > > > > Mounting gfs2meta enforces that a gf2s superblock must already > > exist. If not, it will error out. Consequently, mounting gfs2meta > > with FSCONFIG_CMD_CREATE_EXCL would always fail. If reusing is the > > intent then FSCONFIG_CMD_CREATE_EXCL shouldn't be specified. > > > > (14) kernfs_get_tree() > > > > Similar logic to get_tree_keyed(). > > > > (15) nfs_get_tree_common() > > > > Similar logic to get_tree_keyed(). > > > > Already explicitly allows forcing the allocation of a new superblock > > via NFS_MOUNT_UNSHARED. This effectively turns it into > > get_tree_nodev(). > > > > Link: [1] https://lore.kernel.org/linux-block/20230704-fasching-wertarbeit-7c6ffb01c83d@brauner > > Link: [2] https://lore.kernel.org/linux-block/20230705-pumpwerk-vielversprechend-a4b1fd947b65@brauner > > Link: [3] https://lore.kernel.org/linux-fsdevel/20230725-einnahmen-warnschilder-17779aec0a97@brauner > > > > The description and use case seem reasonable to me, the code is straightforward, > I assume you've added RFC only because you want comments on wether we want the > functionality/your approach to fixing the problem is acceptable? For me this Yes, exactly. > makes sense and is simple enough. I have a nit where I think we should say Ok, great. > something like "can't reuse existing file system" instead of "superblock", just Ok. > because we know what superblock means doesn't mean the user will necessarily > understand that out the gate. > > You can add > > Reviewed-by: Josef Bacik <josef@xxxxxxxxxxxxxx> Thanks!