> > tree of any filesystem (in-tree one or not) will have to go through the > > changes and figure out WTF to do with their existing code. We are > > going to play whack-a-mole for at least several years as development > > branches get rebased and merged. > > Let me write something up. So here I've written two porting.rst patches that aim to reflect the current state of things (They do _not_ reflect what's in Christoph's series here as that'ss again pretty separate and will require additional spelling out.). I'm adding explanation for both the old and new logic fwiw. I hope to upstream these docs soon so we all have something to point to. >From 200666901f53db74edf309d48e3c74fd275a822a Mon Sep 17 00:00:00 2001 From: Christian Brauner <brauner@xxxxxxxxxx> Date: Fri, 15 Sep 2023 16:01:02 +0200 Subject: [PATCH 1/2] porting: document new block device opening order Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx> --- Documentation/filesystems/porting.rst | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst index deac4e973ddc..f436b64b77bf 100644 --- a/Documentation/filesystems/porting.rst +++ b/Documentation/filesystems/porting.rst @@ -949,3 +949,27 @@ mmap_lock held. All in-tree users have been audited and do not seem to depend on the mmap_lock being held, but out of tree users should verify for themselves. If they do need it, they can return VM_FAULT_RETRY to be called with the mmap_lock held. + +--- + +**mandatory** + +The order of opening block devices and matching or creating superblocks has +changed. + +The old logic opened block devices first and then tried to find a +suitable superblock to reuse based on the block device pointer. + +The new logic finds or creates a superblock first, opening block devices +afterwards. Since opening block devices cannot happen under s_umount because of +lock ordering requirements s_umount is now dropped while opening block +devices and reacquired before calling fill_super(). + +In the old logic concurrent mounters would find the superblock on the list of +active superblock for the filesystem type. Since the first opener of the block +device would hold s_umount they would wait until the superblock became either +born or died prematurely due to initialization failure. + +Since the new logic drops s_umount concurrent mounters could grab s_umount and +would spin. Instead they are now made to wait using an explicit wait-wake +mechanism without having to hold s_umount. -- 2.34.1 >From 1f09898322b4402219d8d3219d399c9e56a76bae Mon Sep 17 00:00:00 2001 From: Christian Brauner <brauner@xxxxxxxxxx> Date: Fri, 15 Sep 2023 16:01:40 +0200 Subject: [PATCH 2/2] porting: document superblock as block device holder Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx> --- Documentation/filesystems/porting.rst | 79 +++++++++++++++++++++++++++ 1 file changed, 79 insertions(+) diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst index f436b64b77bf..fefefaf289b4 100644 --- a/Documentation/filesystems/porting.rst +++ b/Documentation/filesystems/porting.rst @@ -973,3 +973,82 @@ born or died prematurely due to initialization failure. Since the new logic drops s_umount concurrent mounters could grab s_umount and would spin. Instead they are now made to wait using an explicit wait-wake mechanism without having to hold s_umount. + +--- + +**mandatory** + +The holder of a block device is now the superblock. + +The holder of a block device used to be the file_system_type which wasn't +particularly useful. It wasn't possible to go from block device to owning +superblock without matching on the device pointer stored in the superblock. +This mechanism would only work for a single device so the block layer couldn't +find the owning superblock associated with additional devices. + +In the old mechanism reusing or creating a superblock for racing mount(2) and +umount(2) relied on the file_system_type as the holder. This was severly +underdocumented however: + +(1) If the concurrent mount(2) managed to grab an active reference before the + umount(2) dropped the last active reference in deactivate_locked_super() + the mounter would simply reuse the existing superblock. + +(2) If the mounter came after deactivate_locked_super() but before + the superblock had been removed from the list of superblocks of the + filesystem type the mounter would wait until the superblock was shutdown + and allocated a new superblock. + +(3) If the mounter came after deactivate_locked_super() and after + the superblock had been removed from the list of superblocks of the + filesystem type the mounter would allocate a new superblock. + +Because the holder of the block device was the filesystem type any concurrent +mounter could open the block device without risking seeing EBUSY because the +block device was still in use. + +Making the superblock the owner of the block device changes this as the holder +is now a unique superblock and not shared among all superblocks of the +filesystem type. So a concurrent mounter in (2) could suddenly see EBUSY when +trying to open a block device whose holder was a different superblock. + +The new logic thus waits until the superblock and the devices are shutdown in +->kill_sb(). Removal of the superblock from the list of superblocks of the +filesystem type is now moved to a later point when the devices are closed: + +(1) Any concurrent mounter managing to grab an active reference on an existing + superblock is made to wait until the superblock is either ready or until + the superblock and all devices are shutdown in ->kill_sb(). + +(2) If the mounter came after deactivate_locked_super() but before + the superblock had been removed from the list of superblocks of the + filesystem type the mounter is made to wait until the superblock and the + devices are shut down in ->kill_sb() and the superblock is removed from the + list of superblocks of the filesystem type. + +(3) This case is now collapsed into (2) as the superblock is left on the list + of superblocks of the filesystem type until all devices are shutdown in + ->kill_sb(). + +As this is a VFS level change it has no practical consequences for filesystems +other than that all of them must use one of the provided kill_litter_super(), +kill_anon_super(), or kill_block_super() helpers. + +Filesystems that reuse superblocks based on non-static keys such as +sb->s_fs_info must ensure that these keys remain valid across kill_*_super() +calls. The expected pattern is:: + + static struct file_system_type some_fs_type = { + .name = "somefs", + .kill_sb = some_fs_kill_sb, + }; + + static void some_fs_kill_sb(struct super_block *sb) + { + struct some_fs_info *info = sb->s_fs_info; + + kill_*_super(sb); + kfree(info); + } + +It's best practice to never deviate from this pattern. -- 2.34.1