On Fri, Aug 04, 2023 at 03:20:45PM +0200, Christian Brauner wrote: > On Fri, Aug 04, 2023 at 12:14:08PM +0200, Christoph Hellwig wrote: > > FYI, I can reproduce this trivially locally, but even after spending a > > significant time with the trace I'm still puzzled at what is going > > on. I've started trying to make sense of the lockdep report about > > returning to userspace with s_umount held, originall locked in > > get_tree_bdev and am still missing how it could happen. > > So in the old scheme: > > s = alloc_super() > -> down_write_nested(&s->s_umount, SINGLE_DEPTH_NESTING); > > and assume you're not finding an old one immediately afterwards you'd > > -> spin_lock(&sb_lock) > > static int set_bdev_super(struct super_block *s, void *data) > { > s->s_bdev = data; > s->s_dev = s->s_bdev->bd_dev; > s->s_bdi = bdi_get(s->s_bdev->bd_disk->bdi); > > if (bdev_stable_writes(s->s_bdev)) > s->s_iflags |= SB_I_STABLE_WRITES; > return 0; > } > > -> spin_unlock(&sb_lock) > > in the new scheme you're doing: > > s = alloc_super() > -> down_write_nested(&s->s_umount, SINGLE_DEPTH_NESTING); > > and assume you're not finding an old one immediately afterwards you'd > > up_write(&s->s_umount); > > error = setup_bdev_super(s, fc->sb_flags, fc); > -> spin_lock(&sb_lock); > sb->s_bdev = bdev; > sb->s_bdi = bdi_get(bdev->bd_disk->bdi); > if (bdev_stable_writes(bdev)) > sb->s_iflags |= SB_I_STABLE_WRITES; > -> spin_unlock(&sb_lock); > > down_write(&s->s_umount); > > Which looks like the lock ordering here is changed? Yes, that none only should be safe, but more importantly should not lead to a return to userspace with s_umount held. Anyway, debugging a regression in mainline right now so I'm taking a break from this one.