On Wed 16-08-23 09:29:08, Christian Brauner wrote: > On Tue, Aug 15, 2023 at 04:43:12PM +0200, Christian Brauner wrote: > > > up_write(&s->s_umount); > > > - blkdev_put(bdev, fs_type); > > > + error = setup_bdev_super(s, flags, NULL); > > > down_write(&s->s_umount); > > > > So I've been looking through the branches to see what's ready for v6.6 > > and what needs some more time. While doing so I went over this again and > > realized that we have an issue here. > > > > While it looks like dropping s_umount here and calling > > setup_bdev_super() is fine I think it isn't. Consider two processes > > racing to create the same mount: > > > > P1 P2 > > vfs_get_tree() vfs_get_tree() > > -> get_tree() == get_tree_bdev() -> get_tree() == get_tree_bdev() > > -> sget_fc() -> sget_fc() > > // allocate new sb; no matching sb found > > -> sb_p1 = alloc_super() > > -> hlist_add_head(&s->s_instances, &s->s_type->fs_supers) > > -> spin_unlock(&sb_lock) > > // yield s_umount to avoid deadlocks > > -> up_write(&sb->s_umount) > > -> spin_lock(&sb_lock) > > // find sb_p1 > > if (test(old, fc)) > > goto share_extant_sb; > > // Assume P1 sleeps on bdev_lock or open_mutex > > // in blkdev_get_by_dev(). > > -> setup_bdev_super() > > -> down_write(&sb->s_umount) > > > > Now P2 jumps to the share_extant_sb label and calls: > > > > grab_super(sb_p1) > > -> spin_unlock(&sb_lock) > > -> down_write(&s->s_umount) > > > > Since s_umount is unlocked P2 doesn't go to sleep and instead immediately > > goes to retry by jumping to the "retry" label. If P1 is still sleeping > > on a a bdev mutex the same thing happens again. > > > > So if you have a range of processes P{1,n} that all try to mount the > > same device you're hammering endlessly on sb_lock without ever going to > > sleep like we used to. The same problem exists for all iterate_supers() > > That part is wrong. If you have P{1,n} and P1 takes s_umount exclusively > then P{2,n} will sleep on s_umount until P1 is done. But there's still > at least on process spinning through sget_fc() for no good reason. No, you're right that the second process is going to effectively busyloop waiting for SB_BORN to be set. I agree we should add some sleeping wait to the loop to avoid pointlessly burning CPU cycles. I'll look into some elegant solution tomorrow. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR