On Mon 21-08-23 17:52:37, Jan Kara wrote: > On Fri 18-08-23 16:00:50, Christian Brauner wrote: > > Recent patches experiment with making it possible to allocate a new > > superblock before opening the relevant block device. Naturally this has > > intricate side-effects that we get to learn about while developing this. > > > > Superblock allocators such as sget{_fc}() return with s_umount of the > > new superblock held and lock ordering currently requires that block > > level locks such as bdev_lock and open_mutex rank above s_umount. > > > > Before aca740cecbe5 ("fs: open block device after superblock creation") > > ordering was guaranteed to be correct as block devices were opened prior > > to superblock allocation and thus s_umount wasn't held. But now s_umount > > must be dropped before opening block devices to avoid locking > > violations. > > > > This has consequences. The main one being that iterators over > > @super_blocks and @fs_supers that grab a temporary reference to the > > superblock can now also grab s_umount before the caller has managed to > > open block devices and called fill_super(). So whereas before such > > iterators or concurrent mounts would have simply slept on s_umount until > > SB_BORN was set or the superblock was discard due to initalization > > failure they can now needlessly spin through sget{_fc}(). > > > > If the caller is sleeping on bdev_lock or open_mutex one caller waiting > > on SB_BORN will always spin somewhere and potentially this can go on for > > quite a while. > > > > It should be possible to drop s_umount while allowing iterators to wait > > on a nascent superblock to either be born or discarded. This patch > > implements a wait_var_event() mechanism allowing iterators to sleep > > until they are woken when the superblock is born or discarded. > > > > This also allows us to avoid relooping through @fs_supers and > > @super_blocks if a superblock isn't yet born or dying. > > > > Link: aca740cecbe5 ("fs: open block device after superblock creation") > > Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx> > ... > > +/* wake waiters */ > > +#define SUPER_WAKE_FLAGS (SB_BORN | SB_DYING) > > +static void super_wake(struct super_block *sb, unsigned int flag) > > +{ > > + WARN_ON_ONCE((flag & ~SUPER_WAKE_FLAGS)); > > + WARN_ON_ONCE(hweight32(flag & SUPER_WAKE_FLAGS) > 1); > > + > > + /* > > + * Pairs with smp_load_acquire() in super_lock() and > > + * ensures that @flag is set before we wake anyone and ensures > > + * that checking whether waitqueue is active isn't hoisted > > + * before the store of @flag. > > + */ > > + sb->s_flags |= flag; > > + smp_mb(); > > + wake_up_var(&sb->s_flags); > > I think we misunderstood here. I believe we need: > > /* > * Pairs with smp_load_acquire() in super_lock() to make sure > * all initializations in the superblock are seen by the user > * seeing SB_BORN sent. > */ > smp_store_release(&sb->s_flags, sb->s_flags | flag); > /* > * Pairs with the barrier in prepare_to_wait_event() to make sure > * ___wait_var_event() either sees SB_BORN set or > * waitqueue_active() check in wake_up_var() sees the waiter > */ > smp_rmb(); > wake_up_var(&sb->s_flags); > > or we need something equivalent with stronger barriers. Like: > > smp_wmb(); > sb->s_flags |= flag; > smp_rmb(); > wake_up_var(&sb->s_flags); Ah, in both of the above cases we actually need smp_mb() (as you properly chose) instead of smp_rmb() I wrote above because we need to establish write-vs-read ordering. BTW, if you pick one of these two schemes, feel free to add: Reviewed-by: Jan Kara <jack@xxxxxxx> If you decide for something else, I'd like to see the result first... Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR