On Thu, Sep 14, 2023 at 05:58:05PM +0100, Al Viro wrote: > On Thu, Sep 14, 2023 at 04:02:25PM +0200, Christian Brauner wrote: > > > Yes, you're right that making the superblock and not the filesytem type > > the bd_holder changes the logic and we are aware of that of course. And > > it requires changes such as moving additional block device closing from > > where some callers currently do it. > > Details, please? Filesystems like xfs and ext4 that closed additional block devices (For example, the logdev= mount option for xfs.) in put_super() could go through stuff like: blkdev_put() -> bdev->bd_disk->fops->release() == lo_release() -> __loop_clr_fd() -> disk_force_media_change() -> __invalidate_device() -> get_super() which wouldn't have been a problem before because get_super() matched on sb->s_bdev which obviously doesn't work because a log device or rt device or whatever isn't the main block device. So we couldn't have deadlocked. But the fact that it is called in that manner from that place in the first place is wildly adventurous especially considering that there isn't __a single comment__ in that code why that is safe. So good luck figuring this all out. Now that we don't have to do that s_bdev matching thing anymore because we directly associate the superblock with the block device we can go straight from block device to superblock. But now calling blkdev_put() under put_super() which holds s_umount could deadlock. So it's moved to kill_sb where it should've always been called. Even without the potential deadlock in the new scheme that's cleaner and easier to understand imho and it just works for any block device. > Note that Christoph's series has mashed (2) and (3) together, resulting > in UAF in a bunch of places. And I'm dead serious about Yes, that I did fix as far as I'm aware. If the rules would've been written down where when something was freed we would've had an easier time figuring this out though. But they weren't so we missed it. > Documentation/filesystems/porting being the right place; any development Yes, agreed. I'll write a document for Christoph's next version. I know that what you're saying is roughly that we shouldn't make the same mistake as were done before but the fact that the old lifetime rules weren't documented in any meaningful way and now we get grumbled at in turn makes me grumble a bit. :) But overall point duly taken. > tree of any filesystem (in-tree one or not) will have to go through the > changes and figure out WTF to do with their existing code. We are > going to play whack-a-mole for at least several years as development > branches get rebased and merged. Let me write something up. > > Incidentally, I'm going to add a (belated by 10 years) chunk in porting.rst > re making sure that anything in superblock that might be needed by methods > called in RCU mode should *not* be freed without an RCU delay... Should've > done that back in 3.12 merge window when RCU'd vfsmounts went in; as it > is, today we have several filesystems with exact same kind of breakage. > hfsplus and affs breakage had been there in 3.13 (missed those two), exfat > and ntfs3 - introduced later, by initial merges of filesystems in question. > Missed on review... Cool, thanks for adding that.