Re: [PATCH v2 2/2] loop: use task_work for autoclear operation

Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> · Wed, 12 Jan 2022 23:00:00 +0900

On 2022/01/12 22:16, Jan Kara wrote:
> I don't think we can fully solve this race in the kernel and IMO it is
> futile to try that - depending on when exactly systemd-udev decides to
> close /dev/loop0 and how exactly mount decides to implement its loop device
> reuse, different strategies will / will not work.

Right, there is no perfect solution.
Only mitigation for less likely hitting this race window is possible.

>                                                   But there is one subtlety
> we need to solve - when __loop_clr_fd() is run outside of disk->open_mutex,
> it can happen that next lo_open() runs before __loop_clr_fd().

Yes, but the problem (I/O error) becomes visible when read() is called.
Also, __loop_clr_fd() from ioctl(LOOP_CLR_FD) runs outside of disk->open_mutex.

>                                                               Thus we can
> hand away a file descriptor to a loop device that is half torn-down and
> will be cleaned up in uncontrollable point in the future which can lead to
> interesting consequences when the device is used at that time as well.

Right, but I don't think we can solve this.

> 
> Perhaps we can fix these problems by checking lo_refcnt in __loop_clr_fd()
> once we grab lo_mutex and thus make sure the device still should be
> destroyed?

That will not help, for (even if __loop_clr_fd() from lo_release() runs
under disk->open_mutex)

                        lo_release() {
                          mutex_lock(&lo->lo_mutex);
                          // Decrements lo->lo_refcnt and finds it became 0.
                          mutex_unlock(&lo->lo_mutex);
                          __loop_clr_fd(lo, true) {
                            mutex_lock(&lo->lo_mutex);
                            // Finds lo->lo_refcnt is still 0.
                            mutex_unlock(&lo->lo_mutex);
                            // Release the backing file.
                          }
                        }
  lo_open() {
    mutex_lock(&lo->lo_mutex);
    // Increments lo->lo_refcnt.
    mutex_unlock(&lo->lo_mutex);
  }
  vfs_read() {
    // Read attempt fails because the backing file is already gone.
  }

sequence might still cause some program to fail with I/O error, and
(since __loop_clr_fd() from loop_clr_fd() does not run under disk->open_mutex)

                        loop_clr_fd() {
                          mutex_lock(&lo->lo_mutex);
                          // Finds lo->lo_refcnt is 1.
                          mutex_unlock(&lo->lo_mutex);
                          __loop_clr_fd(lo, true) {
                            mutex_lock(&lo->lo_mutex);
                            // Finds lo->lo_refcnt is still 1.
                            mutex_unlock(&lo->lo_mutex);
  lo_open() {
    mutex_lock(&lo->lo_mutex);
    // Increments lo->lo_refcnt.
    mutex_unlock(&lo->lo_mutex);
  }
                            // Release the backing file.
                          }
                        }
  vfs_read() {
    // Read attempt fails because the backing file is already gone.
  }

sequence is still possible.

We don't want to hold disk->open_mutex and/or lo->lo_mutex when
an I/O request on this loop device is issued, do we?

Do we want to hold disk->open_mutex when calling __loop_clr_fd() from
loop_clr_fd() ? That might be useful for avoiding some race situation
but is counter way to locking simplification.