On 2022/01/12 22:16, Jan Kara wrote: > I don't think we can fully solve this race in the kernel and IMO it is > futile to try that - depending on when exactly systemd-udev decides to > close /dev/loop0 and how exactly mount decides to implement its loop device > reuse, different strategies will / will not work. Right, there is no perfect solution. Only mitigation for less likely hitting this race window is possible. > But there is one subtlety > we need to solve - when __loop_clr_fd() is run outside of disk->open_mutex, > it can happen that next lo_open() runs before __loop_clr_fd(). Yes, but the problem (I/O error) becomes visible when read() is called. Also, __loop_clr_fd() from ioctl(LOOP_CLR_FD) runs outside of disk->open_mutex. > Thus we can > hand away a file descriptor to a loop device that is half torn-down and > will be cleaned up in uncontrollable point in the future which can lead to > interesting consequences when the device is used at that time as well. Right, but I don't think we can solve this. > > Perhaps we can fix these problems by checking lo_refcnt in __loop_clr_fd() > once we grab lo_mutex and thus make sure the device still should be > destroyed? That will not help, for (even if __loop_clr_fd() from lo_release() runs under disk->open_mutex) lo_release() { mutex_lock(&lo->lo_mutex); // Decrements lo->lo_refcnt and finds it became 0. mutex_unlock(&lo->lo_mutex); __loop_clr_fd(lo, true) { mutex_lock(&lo->lo_mutex); // Finds lo->lo_refcnt is still 0. mutex_unlock(&lo->lo_mutex); // Release the backing file. } } lo_open() { mutex_lock(&lo->lo_mutex); // Increments lo->lo_refcnt. mutex_unlock(&lo->lo_mutex); } vfs_read() { // Read attempt fails because the backing file is already gone. } sequence might still cause some program to fail with I/O error, and (since __loop_clr_fd() from loop_clr_fd() does not run under disk->open_mutex) loop_clr_fd() { mutex_lock(&lo->lo_mutex); // Finds lo->lo_refcnt is 1. mutex_unlock(&lo->lo_mutex); __loop_clr_fd(lo, true) { mutex_lock(&lo->lo_mutex); // Finds lo->lo_refcnt is still 1. mutex_unlock(&lo->lo_mutex); lo_open() { mutex_lock(&lo->lo_mutex); // Increments lo->lo_refcnt. mutex_unlock(&lo->lo_mutex); } // Release the backing file. } } vfs_read() { // Read attempt fails because the backing file is already gone. } sequence is still possible. We don't want to hold disk->open_mutex and/or lo->lo_mutex when an I/O request on this loop device is issued, do we? Do we want to hold disk->open_mutex when calling __loop_clr_fd() from loop_clr_fd() ? That might be useful for avoiding some race situation but is counter way to locking simplification.