On Wed 12-01-22 23:00:00, Tetsuo Handa wrote: > On 2022/01/12 22:16, Jan Kara wrote: > > I don't think we can fully solve this race in the kernel and IMO it is > > futile to try that - depending on when exactly systemd-udev decides to > > close /dev/loop0 and how exactly mount decides to implement its loop device > > reuse, different strategies will / will not work. > > Right, there is no perfect solution. > Only mitigation for less likely hitting this race window is possible. > > > But there is one subtlety > > we need to solve - when __loop_clr_fd() is run outside of disk->open_mutex, > > it can happen that next lo_open() runs before __loop_clr_fd(). > > Yes, but the problem (I/O error) becomes visible when read() is called. > Also, __loop_clr_fd() from ioctl(LOOP_CLR_FD) runs outside of disk->open_mutex. True, that path is similar. But now I have realized that once we decide to tear down the loop device we set lo_state to Lo_rundown and so ioctls such as LOOP_GET_STATUS start to fail and also new IO submissions start to fail with IO error. So from userspace POV everything should be consistent. > > Thus we can > > hand away a file descriptor to a loop device that is half torn-down and > > will be cleaned up in uncontrollable point in the future which can lead to > > interesting consequences when the device is used at that time as well. > > Right, but I don't think we can solve this. Well, we cannot guarantee what will be state of the loop device when you open it but I think we should guarantee that once you have loop device open, it will not be torn down under your hands. And now that I have realized there are those lo_state checks, I think everything is fine in that regard. I wanted to make sure that sequence such as: fd = open(loop device) ioctl(fd, LOOP_GET_STATUS, &stats) verify in stats loop dev has parameters we want ... use loop device ... is race free and that already seems to be the case - once LOOP_GET_STATUS does not return error, we know there is no pending loop device shutdown. So the only bug seems to be in mount(8) code where the loop device reuse detection is racy and the subtle change in the timing with your loop changes makes it break. I'll write to util-linux maintainer about this. > We don't want to hold disk->open_mutex and/or lo->lo_mutex when > an I/O request on this loop device is issued, do we? Currently, we may hold both. With your async patch we hold only lo_mutex. Now that I better understand the nature of the deadlock, I agree that holding either still creates the deadlock possibility because both are acquired on loop device open. But now that I reminded myself the lo_state handling, I think the following should be safe in __loop_clr_fd: /* Just a safety check... */ if (WARN_ON_ONCE(data_race(lo->lo_state) != Lo_rundown)) return -ENXIO; /* * With Lo_rundown state, no new work can be queued. Flush pending * IO and destroy workqueue. */ blk_mq_freeze_queue(lo->lo_queue); destroy_workqueue(lo->workqueue); mutex_lock(&lo->lo_mutex); ... do the rest of the teardown ... > Do we want to hold disk->open_mutex when calling __loop_clr_fd() from > loop_clr_fd() ? That might be useful for avoiding some race situation > but is counter way to locking simplification. No, I'm not suggesting that. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR