Re: [PATCH RFC 0/6] fs,block: yield devices

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed 25-10-23 22:46:29, Christian Brauner wrote:
> On Wed, Oct 25, 2023 at 07:20:57PM +0200, Jan Kara wrote:
> > Hello!
> > 
> > On Tue 24-10-23 16:53:38, Christian Brauner wrote:
> > > This is a mechanism that allows the holder of a block device to yield
> > > device access before actually closing the block device.
> > > 
> > > If a someone yields a device then any concurrent opener claiming the
> > > device exclusively with the same blk_holder_ops as the current owner can
> > > wait for the device to be given up. Filesystems by default use
> > > fs_holder_ps and so can wait on each other.
> > > 
> > > This mechanism allows us to simplify superblock handling quite a bit at
> > > the expense of requiring filesystems to yield devices. A filesytems must
> > > yield devices under s_umount. This allows costly work to be done outside
> > > of s_umount.
> > > 
> > > There's nothing wrong with the way we currently do things but this does
> > > allow us to simplify things and kills a whole class of theoretical UAF
> > > when walking the superblock list.
> > 
> > I'm not sure why is it better to create new ->yield callback called under
> > sb->s_umount rather than just move blkdev_put() calls back into
> > ->put_super? Or at least yielding could be done in ->put_super instead of
> 
> The main reason was to not call potentially expensive
> blkdev_put()/bdev_release() under s_umount. If we don't care about this
> though then this shouldn't be a problem.

So I would not be really concerned about performance here. The superblock
is dying, nobody can do anything about that until the superblock is fully
dead and cleaned up. Maybe some places could skip such superblocks more
quickly but so far I'm not convinced it matters in practice (e.g. writeback
holds s_umount over the whole sync(1) time and nobody complains). And as
you mention below, we have been doing this for a long time without anybody
really complaining.

> And yes, then we need to move
> blkdev_put()/bdev_release() under s_umount including the main block
> device. IOW, we need to ensure that all bdev calls are done under
> s_umount before we remove the superblock from the instance list.

This is about those seemingly spurious "device busy" errors when the
superblock hasn't closed its devices yet, isn't it?  But we now remove
superblock from s_instances list in kill_super_notify() and until that
moment SB_DYING is protecting us from racing mounts. So is there some other
problem?

> I think
> that should be fine but I wanted to propose an alternative to that as
> well: cheap mark-for-release under s_umount and heavy-duty without
> s_umount. But I guess it doesn't matter because most filesystems did use
> to close devices under s_umount before anyway. Let me know what you
> think makes the most sense.

I think we should make it as simple as possible for filesystems. As I said
above I don't think s_umount hold time really matters so the only thing
that limits us are locking constraints. During superblock shutdown the only
thing that currently cannot be done under s_umount (that I'm aware of) is the
teardown of the sb->s_bdi because that waits for writeback threads and
those can be blocked waiting for s_umount.

So if we wanted we could just move ext4 & xfs bdev closing back into
->put_super to avoid ext4_kill_sb() and somewhat slim down xfs_mount_free()
but otherwise I don't see much space for simplification or need for adding
more callbacks :)

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux