On Tue, Aug 15, 2023 at 04:43:12PM +0200, Christian Brauner wrote: > > up_write(&s->s_umount); > > - blkdev_put(bdev, fs_type); > > + error = setup_bdev_super(s, flags, NULL); > > down_write(&s->s_umount); > > So I've been looking through the branches to see what's ready for v6.6 > and what needs some more time. While doing so I went over this again and > realized that we have an issue here. > > While it looks like dropping s_umount here and calling > setup_bdev_super() is fine I think it isn't. Consider two processes > racing to create the same mount: > > P1 P2 > vfs_get_tree() vfs_get_tree() > -> get_tree() == get_tree_bdev() -> get_tree() == get_tree_bdev() > -> sget_fc() -> sget_fc() > // allocate new sb; no matching sb found > -> sb_p1 = alloc_super() > -> hlist_add_head(&s->s_instances, &s->s_type->fs_supers) > -> spin_unlock(&sb_lock) > // yield s_umount to avoid deadlocks > -> up_write(&sb->s_umount) > -> spin_lock(&sb_lock) > // find sb_p1 > if (test(old, fc)) > goto share_extant_sb; > // Assume P1 sleeps on bdev_lock or open_mutex > // in blkdev_get_by_dev(). > -> setup_bdev_super() > -> down_write(&sb->s_umount) > > Now P2 jumps to the share_extant_sb label and calls: > > grab_super(sb_p1) > -> spin_unlock(&sb_lock) > -> down_write(&s->s_umount) > > Since s_umount is unlocked P2 doesn't go to sleep and instead immediately > goes to retry by jumping to the "retry" label. If P1 is still sleeping > on a a bdev mutex the same thing happens again. > > So if you have a range of processes P{1,n} that all try to mount the > same device you're hammering endlessly on sb_lock without ever going to > sleep like we used to. The same problem exists for all iterate_supers() That part is wrong. If you have P{1,n} and P1 takes s_umount exclusively then P{2,n} will sleep on s_umount until P1 is done. But there's still at least on process spinning through sget_fc() for no good reason.