On Fri, Aug 11, 2023 at 10:10:42AM +0200, Jan Kara wrote: > On Thu 10-08-23 22:47:03, Kent Overstreet wrote: > > On Thu, Aug 10, 2023 at 07:52:05PM +0200, Jan Kara wrote: > > > On Thu 10-08-23 11:54:53, Kent Overstreet wrote: > > > > > And there clearly is something very strange going on with superblock > > > > > handling > > > > > > > > This deserves an explanation because sget() is a bit nutty. > > > > > > > > The way sget() is conventionally used for block device filesystems, the > > > > block device open _isn't actually exclusive_ - sure, FMODE_EXCL is used, > > > > but the holder is the fs type pointer, so it won't exclude with other > > > > opens of the same fs type. > > > > > > > > That means the only protection from multiple opens scribbling over each > > > > other is sget() itself - but if the bdev handle ever outlives the > > > > superblock we're completely screwed; that's a silent data corruption bug > > > > that we can't easily catch, and if the filesystem teardown path has any > > > > asynchronous stuff going on (and of course it does) that's not a hard > > > > mistake to make. I've observed at least one bug that looked suspiciously > > > > like that, but I don't think I quite pinned it down at the time. > > > > > > This is just being changed - check Christian's VFS tree. There are patches > > > that make sget() use superblock pointer as a bdev holder so the reuse > > > you're speaking about isn't a problem anymore. > > > > So then the question is what do you use for identifying the superblock, > > and you're switching to the dev_t - interesting. > > > > Are we 100% sure that will never break, that a dev_t will always > > identify a unique block_device? Namespacing has been changing things. > > Yes, dev_t is a unique identifier of the device, we rely on that in > multiple places, block device open comes to mind as the first. You're > right namespacing changes things but we implement that as changing what > gets presented to userspace via some mapping layer while the kernel keeps > using globally unique identifiers. Full device namespacing is not on the horizon at all. We've looked into this years ago and it woud be a giant effort that would effect nearly everything if the properly. So even if, there would be so many changes required that reliance on dev_t in the VFS would be the least of our problems.