On 2022/03/16 21:26, Jan Kara wrote: >>> 1) Can you please remind me why do we need to also remove the lo_mutex from >>> lo_open()? The original lockdep problem was only for __loop_clr_fd(), >>> wasn't it? >> >> Right, but from locking dependency chain perspective, holding >> lo->lo_mutex from lo_open()/lo_release() is treated as if >> lo_open()/lo_release() waits for flush of I/O with lo->lo_mutex held, >> even if it is not lo_open()/lo_release() which actually flushes I/O with >> lo->lo_mutex held. >> >> We need to avoid holding lo->lo_mutex from lo_open()/lo_release() while >> avoiding problems caused by not waiting for loop_configure()/__loop_clr_fd(). > > I thought the whole point of these patches is to move waiting for IO from > under lo->lo_mutex and disk->open_mutex? Excuse me? I couldn't tell you say "remove ... from ..." or "move ... from ... to ...". The point of my patch is to avoid disk->open_mutex => lo->lo_mutex disk->open_mutex => loop_validate_mutex => lo->lo_mutex dependency chain, for there is system_transition_mutex/1 => disk->open_mutex dependency chain and blk_mq_freeze_queue()/blk_mq_unfreeze_queue() are called with lo->lo_mutex held. Christoph's "[PATCH 4/8] loop: don't freeze the queue in lo_release" and "[PATCH 5/8] loop: only freeze the queue in __loop_clr_fd when needed" stops calling blk_mq_freeze_queue()/blk_mq_unfreeze_queue() with lo->lo_mutex held only from lo_release(). There are locations (e.g. __loop_update_dio()) which calls blk_mq_freeze_queue()/blk_mq_unfreeze_queue() with lo->lo_mutex held. > And if we do that, there's no > problem with open needing either of these mutexes? What am I missing? > > Anyway, I have no problem with removing lo->lo_mutex from lo_open() as e.g. > Christoph done it. I just disliked the task work magic you did there... > Hmm, "these patches" === "yet another approach to fix the loop lock order inversions v3" than my "[PATCH v2] loop: don't hold lo->lo_mutex from lo_open() and lo_release()" ? >>> 2) Cannot we just move the generation of DISK_EVENT_MEDIA_CHANGE event >>> after the moment the loop device is configured? That way systemd won't >>> start probing the new loop device until it is fully configured. >> > Yes, something like that. But disk_force_media_change() has some > invalidation in it as well and that may need to stay in the original place > - needs a bit more thought & validation. Yes, please. I don't know about invalidation. But can't we instead remove if (lo->lo_state != Lo_bound) return BLK_STS_IOERR; from loop_queue_rq() ? There is some mechanism which can guarantee that crash does not happen even if lo_queue_work() is called (blk_mq_*freeze_queue() ?), isn't there? > >> Such change will defer only open()->read() sequence triggered via >> kobject_uevent(KOBJ_CHANGE). My patch defers all open() from userspace, >> like we did until 5.11. > > Correct. But EIO error is the correct return if you access unbound loop > device. It is only after we tell userspace the device exists (i.e., after > we send the event), when we should better make sure the IO succeeds. Userspace may open this loop device with arbitrary delay; this is asyncronous notification. By the moment some userspace process opens this loop device and issues a read request as a response to "this device is ready", loop_clr_fd() can be called (because lo_open() no longer holds lo->lo_mutex) by some other process and this loop device can become unbound. There must be some mechanism which can guarantee that crash does not happen. > Sorry, I was wrong here. It will not cause deadlock. But in-kernel openers > of the block device like swsusp_check() in kernel/power/swap.c (using > blkdev_get_by_dev()) will be now operating on a loop device without waiting > for __loop_clr_fd(). Which is I guess fine but the inconsistency between > in-kernel and userspace bdev openers would be a bit weird. It looks weired, but I think it is unavoidable in order to keep kernel locking dependency chain simple, for in-kernel openers might be holding e.g. system_transition_mutex/1. > Anyway, > Christophs patches look like a cleaner way forward to me... My patch behaves as if int fd = open("/dev/loopX", 3); ioctl(fd, LOOP_CLR_FD); close(fd); is automatically called when close() by the last user of /dev/loopX returns, in order to keep kernel locking dependency chain short and simple. Nothing difficult.