Re: [PATCH] bcachefs: Fix sysfs warning in fstests generic/730,731

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 14, 2024 at 08:34:06AM +0200, Greg Kroah-Hartman wrote:
> On Mon, Oct 14, 2024 at 08:10:19AM +0200, Christoph Hellwig wrote:
> > On Sat, Oct 12, 2024 at 02:42:39PM -0400, Kent Overstreet wrote:
> > > sysfs warns if we're removing a symlink from a directory that's no
> > > longer in sysfs; this is triggered by fstests generic/730, which
> > > simulates hot removal of a block device.
> > > 
> > > This patch is however not a correct fix, since checking
> > > kobj->state_in_sysfs on a kobj owned by another subsystem is racy.
> > > 
> > > A better fix would be to add the appropriate check to
> > > sysfs_remove_link() - and sysfs_create_link() as well.
> > 
> > The proper fix is to not link to random other subsystems with
> > object lifetimes you can't know.  I'm not sure why you think adding
> > this link was ever allowed.
> > 
> 
> Odd, I never got the original patch that was sent here in the first
> place...
> 
> Anyway, Christoph is right, this patch isn't ok.  You can't link outside
> of the subdirectory in which you control in sysfs without a whole lot of
> special cases and control.  The use of sysfs for filesystems is almost
> always broken and tricky and full of race conditions (see many past
> threads about this.)  Ideally we would fix this up by offering common
> code for filesystems to use for sysfs (like we do for the driver
> subsystems), but no one has gotten around to it for various reasons.

There was already past precedent with the block/holder.c code, and
userspace does depend on that for determining the topology of virtual
block devices.

And that really is what sysfs is for, determining the actual topology
and relationships between various devices - so if there's a relationship
between devices we need to be able to expose that.

I don't know why bcache never used the block/holder.c code (predates it,
perhaps?) - but that code has been carried over basically unchanged, and
we likely still depend on it (I'd have to dig around in tools...).

Re: the safety issues, I don't agree - provided you have a stable
reference to the underlying kobject, which we do, since we have the
block device open. The race is only w.r.t. kobj->state_in_sysfs, and
that could be handled easily within the sysfs/kobject code.
 
> The only filesystem that I can see that attempts to do much like what
> bcachefs does in sysfs is btrfs, but btrfs only seems to have one
> symlink, while you have multiple ones pointing to the same block device.

Not sure where you're seeing that? It's just a single backreference from
the block device to the filesystem object.

> I can't find any sysfs documentation in Documentation/ABI/ so I don't
> really understand what it's attempting to do (and why isn't the tools
> that check this screaming about that lack of documentation, that's
> odd...)  Any hints as to what you are wishing to show here?

Basically, it's the cleanest way (by far) for userspace to look up the
filesystem from the block device: given a path to a block device, stat
it to get the major:minor, then try to open
/sys/dev/block/major:minor/bcachefs/.

The alternative would be scanning through /proc/mounts, which is really
nasty - the format isn't particularly cleanly specified, it's racy, and
with containers systems are getting into the thousands of mounts these
days.




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux