Hi Jens, > -----Original Message----- > From: Jens Axboe <axboe@xxxxxxxxx> > Sent: Wednesday, May 22, 2024 7:10 AM > To: Gulam Mohamed <gulam.mohamed@xxxxxxxxxx>; linux- > block@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx > Cc: shinichiro.kawasaki@xxxxxxx; chaitanyak@xxxxxxxxxx; hch@xxxxxx > Subject: Re: [PATCH V2 for-6.10/block 1/2] loop: Fix a race between loop > detach and loop open > > On 5/21/24 4:42 PM, Gulam Mohamed wrote: > > Description > > =========== > > > > 1. Userspace sends the command "losetup -d" which uses the open() call > > to open the device > > 2. Kernel receives the ioctl command "LOOP_CLR_FD" which calls the > > function loop_clr_fd() > > 3. If LOOP_CLR_FD is the first command received at the time, then the > > AUTOCLEAR flag is not set and deletion of the > > loop device proceeds ahead and scans the partitions (drop/add > > partitions) > > > > if (disk_openers(lo->lo_disk) > 1) { > > lo->lo_flags |= LO_FLAGS_AUTOCLEAR; > > loop_global_unlock(lo, true); > > return 0; > > } > > > > 4. Before scanning partitions, it will check to see if any partition of > > the loop device is currently opened 5. If any partition is > > opened, then it will return EBUSY: > > > > if (disk->open_partitions) > > return -EBUSY; > > 6. So, after receiving the "LOOP_CLR_FD" command and just before the > above > > check for open_partitions, if any other command > > (like blkid) opens any partition of the loop device, then the partition > > scan will not proceed and EBUSY is returned as shown in above code > > 7. But in "__loop_clr_fd()", this EBUSY error is not propagated 8. We > > have noticed that this is causing the partitions of the loop to > > remain stale even after the loop device is detached resulting in the > > IO errors on the partitions > > > > Fix > > --- > > Re-introduce the lo_open() call to restrict any process to open the > > loop device when its being detached > > > > Test case > > ========= > > Test case involves the following two scripts: > > > > script1.sh > > ---------- > > while [ 1 ]; > > do > > losetup -P -f /home/opt/looptest/test10.img > > blkid /dev/loop0p1 > > done > > > > script2.sh > > ---------- > > while [ 1 ]; > > do > > losetup -d /dev/loop0 > > done > > > > Without fix, the following IO errors have been observed: > > > > kernel: __loop_clr_fd: partition scan of loop0 failed (rc=-16) > > kernel: I/O error, dev loop0, sector 20971392 op 0x0:(READ) flags 0x80700 > > phys_seg 1 prio class 0 > > kernel: I/O error, dev loop0, sector 108868 op 0x0:(READ) flags 0x0 > > phys_seg 1 prio class 0 > > kernel: Buffer I/O error on dev loop0p1, logical block 27201, async page > > read > > > > V1->V2: > > Added a test case, 010, in blktests in tests/loop/ > > Signed-off-by: Gulam Mohamed <gulam.mohamed@xxxxxxxxxx> > > --- > > drivers/block/loop.c | 19 +++++++++++++++++++ > > 1 file changed, 19 insertions(+) > > > > diff --git a/drivers/block/loop.c b/drivers/block/loop.c index > > 28a95fd366fe..9a235d8c062d 100644 > > --- a/drivers/block/loop.c > > +++ b/drivers/block/loop.c > > @@ -1717,6 +1717,24 @@ static int lo_compat_ioctl(struct block_device > > *bdev, blk_mode_t mode, } #endif > > > > +static int lo_open(struct gendisk *disk, blk_mode_t mode) { > > + struct loop_device *lo = disk->private_data; > > + int err; > > + > > + if (!lo) > > + return -ENXIO; > > + > > + err = mutex_lock_killable(&lo->lo_mutex); > > + if (err) > > + return err; > > + > > + if (lo->lo_state == Lo_rundown) > > + err = -ENXIO; > > + mutex_unlock(&lo->lo_mutex); > > + return err; > > +} > > Most of this function uses spaces rather than tabs. I am sorry I should have checked that. Will address the issue in next version > > -- > Jens Axboe