Re: [PATCH v2 3/4] super: wait for nascent superblocks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri 18-08-23 14:46:57, Christian Brauner wrote:
> On Fri, Aug 18, 2023 at 02:02:15PM +0200, Jan Kara wrote:
> > On Fri 18-08-23 12:54:17, Christian Brauner wrote:
> > > Recent patches experiment with making it possible to allocate a new
> > > superblock before opening the relevant block device. Naturally this has
> > > intricate side-effects that we get to learn about while developing this.
> > > 
> > > Superblock allocators such as sget{_fc}() return with s_umount of the
> > > new superblock held and lock ordering currently requires that block
> > > level locks such as bdev_lock and open_mutex rank above s_umount.
> > > 
> > > Before aca740cecbe5 ("fs: open block device after superblock creation")
> > > ordering was guaranteed to be correct as block devices were opened prior
> > > to superblock allocation and thus s_umount wasn't held. But now s_umount
> > > must be dropped before opening block devices to avoid locking
> > > violations.
> > > 
> > > This has consequences. The main one being that iterators over
> > > @super_blocks and @fs_supers that grab a temporary reference to the
> > > superblock can now also grab s_umount before the caller has managed to
> > > open block devices and called fill_super(). So whereas before such
> > > iterators or concurrent mounts would have simply slept on s_umount until
> > > SB_BORN was set or the superblock was discard due to initalization
> > > failure they can now needlessly spin through sget{_fc}().
> > > 
> > > If the caller is sleeping on bdev_lock or open_mutex one caller waiting
> > > on SB_BORN will always spin somewhere and potentially this can go on for
> > > quite a while.
> > > 
> > > It should be possible to drop s_umount while allowing iterators to wait
> > > on a nascent superblock to either be born or discarded. This patch
> > > implements a wait_var_event() mechanism allowing iterators to sleep
> > > until they are woken when the superblock is born or discarded.
> > > 
> > > This also allows us to avoid relooping through @fs_supers and
> > > @super_blocks if a superblock isn't yet born or dying.
> > > 
> > > Link: aca740cecbe5 ("fs: open block device after superblock creation")
> > > Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx>
> > 
> > Looks mostly good to me. I've spotted only a couple of nits and one
> > possible memory ordering issue...
...
> > > +/* wait and acquire read-side of @sb->s_umount */
> > > +static inline bool super_lock_shared(struct super_block *sb)
> > > +{
> > > +	return super_lock(sb, false);
> > > +}
> > > +
> > > +/* wait and acquire write-side of @sb->s_umount */
> > > +static inline bool super_lock_excl(struct super_block *sb)
> > > +{
> > > +	return super_lock(sb, true);
> > > +}
> > > +
> > > +/* wake waiters */
> > > +#define SUPER_WAKE_FLAGS (SB_BORN | SB_DYING)
> > > +static void super_wake(struct super_block *sb, unsigned int flag)
> > > +{
> > > +	unsigned int flags = sb->s_flags;
> > > +
> > > +	WARN_ON_ONCE((flag & ~SUPER_WAKE_FLAGS));
> > > +	WARN_ON_ONCE(hweight32(flag & SUPER_WAKE_FLAGS) > 1);
> > 
> > Maybe assert here that s_umount is held?
> 
> I think that should be asserted in callers because we don't hold it when
> we do wake SB_DEAD in deactivate_locked_super() because we don't have or
> need it.

Right, I've realized that after I've sent this.

> > > +
> > > +	/*
> > > +	 * Pairs with smp_load_acquire() in super_lock() and
> > > +	 * ensures that @flag is set before we wake anyone.
> > > +	 */
> > > +	smp_store_release(&sb->s_flags, flags | flag);
> > > +	wake_up_var(&sb->s_flags);
> > 
> > As I'm thinking about it now, we may need at least a smp_rmb() between the
> > store and wake_up_var(). What I'm worried about is the following:
> > 
> > TASK1					TASK2
> > super_wake()				super_lock()
> > 					  check s_flags, SB_BORN not set yet
> >   waitqueue_active() from wake_up_var()
> >     which got reordered by the CPU before
> >     smp_store_release(). This seems possible
> >     because release is a one-way permeable in
> >     this direction.
> > 					  wait_var_event(..)
> > 					    prepare_to_wait_event()
> > 					    wait_born()
> > 					      SB_BORN still not set => sleep
> >   smp_store_release() sets SB_BORN
> >   wake_up_var() does nothing because it thinks
> >     the waitqueue is empty.
> 
> Then I propse we use smp_mb() here similar to what we do for __I_NEW.
> Does that sounds ok?

Sure, fine by me.

								Honza

-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux