On Fri, Mar 22, 2024 at 02:10:30PM +0100, Jan Kara wrote: > > End result: > > > > * old ->i_private leaked (already grabbed by your get_bdev_file()) > > * ->bd_openers at 1 (after your bdev_open() gets through) > > * ->i_private left NULL. > > > > Christoph, could we please get rid of that atomic_t nonsense? > > It only confuses people into brainos like that. It really > > needs ->open_mutex for any kind of atomicity. > > Well, there are a couple of places where we end up reading bd_openers > without ->open_mutex. Sure these places are racy wrt other opens / closes > so they need to be careful but we want to make sure we read back at least > some sane value which is not guaranteed with normal int and compiler > possily playing weird tricks when updating it. But yes, we could convert > the atomic_t to using READ_ONCE + WRITE_ONCE in appropriate places to avoid > these issues and make it more obvious bd_openers are not really handled in > an atomic way. What WRITE_ONE()? We really shouldn't modify it without ->open_mutex; do we ever do that? In current mainline: in blkdev_get_whole(), both callers under ->open_mutex: block/bdev.c:671: if (!atomic_read(&bdev->bd_openers)) block/bdev.c:675: atomic_inc(&bdev->bd_openers); in blkdev_put_whole(), the sole caller under ->open_mutex: block_mutex/bdev.c:681: if (atomic_dec_and_test(&bdev->bd_openers)) in blkdev_get_part(), both callers under ->open_mutex: block/bdev.c:700: if (!atomic_read(&part->bd_openers)) { block/bdev.c:704: atomic_inc(&part->bd_openers); in blkdev_put_whole(), the sole caller under ->open_mutex: block/bdev.c:741: if (atomic_dec_and_test(&part->bd_openers)) { in bdev_release(), a deliberately racy reader, commented as such: block/bdev.c:1032: if (atomic_read(&bdev->bd_openers) == 1) in sync_bdevs(), under ->open_mutex: block/bdev.c:1163: if (!atomic_read(&bdev->bd_openers)) { in bdev_del_partition(), under ->open_mutex: block/partitions/core.c:460: if (atomic_read(&part->bd_openers)) and finally, in disk_openers(), a racy reader: include/linux/blkdev.h:231: return atomic_read(&disk->part0->bd_openers); So that's two READ_ONCE() and a bunch of reads and writes under ->open_mutex. Callers of disk_openers() need to be careful and looking through those... Some of them are under ->open_mutex (either explicitly, or as e.g. lo_release() called only via bdev ->release(), which comes only under ->open_mutex), but four of them are not: arch/um/drivers/ubd_kern.c:1023: if (disk_openers(ubd_dev->disk)) in ubd_remove(). Racy, possibly a bug. AFAICS, it's accessible through UML console and there's nothing to stop it from racing with open(). drivers/block/loop.c:1245: if (disk_openers(lo->lo_disk) > 1) { in loop_clr_fd(). Under loop's private lock, but that's likely to be a race - ->bd_openers updates are not under that. Note that there's no ->open() for /dev/loop, BTW... drivers/block/loop.c:2161: if (lo->lo_state != Lo_unbound || disk_openers(lo->lo_disk) > 0) { in loop_control_remove(). Similar to the previous one, except that it's done out of band *and* it doesn't have the "autoclean" logics to work around udev, lovingly described in the comment before the call in loop_clr_fd(). drivers/block/nbd.c:1279: if (disk_openers(nbd->disk) > 1) in nbd_bdev_reset(). Under nbd private mutex (->config_lock), so there's some exclusion with nbd_open(), but ->bd_openers change comes outside of that. Might or might not be a bug - I need to wake up properly to look through that.