Re: [PATCH 7/8] loop: remove lo_refcount and avoid lo_mutex in ->open / ->release

Jan Kara <jack@xxxxxxx> · Wed, 16 Mar 2022 12:22:58 +0100

On Wed 16-03-22 09:45:18, Christoph Hellwig wrote:
> lo_refcount counts how many openers a loop device has, but that count
> is already provided by the block layer in the bd_openers field of the
> whole-disk block_device.  Remove lo_refcount and allow opens to
> succeed even on devices beeing deleted - now that ->free_disk is
> implemented we can handle that race gracefull and all I/O on it will
> just fail. Similarly there is a small race window now where
> loop_control_remove does not synchronize the delete vs the remove
> due do bd_openers not being under lo_mutex protection, but we can
> handle that just as gracefully.

Honestly, I'm a bit uneasy about these races and ability to have open
removed loop device (given use of loop devices in container environments
potential problems could have security implications). But I guess it is no
different to having open hot-unplugged scsi disk. There may be just an
expectation from userspace that your open either blocks loop device removal
or open fails. But I guess we can deal with that if some real breakage
happens - it does not seem that hard to solve - we just need
loop_control_remove() to grab disk->open_mutex to make transition to
Lo_deleting state safe and keep Lo_deleting check in lo_open(). Plus we'd
need to use READ_ONCE / WRITE_ONCE for lo_state accesses.

> Signed-off-by: Christoph Hellwig <hch@xxxxxx>
> ---
>  drivers/block/loop.c | 36 +++++++-----------------------------
>  drivers/block/loop.h |  1 -
>  2 files changed, 7 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index cbaa18bcad1fe..c270f3715d829 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -1244,7 +1244,7 @@ static int loop_clr_fd(struct loop_device *lo)
>  	 * <dev>/do something like mkfs/losetup -d <dev> causing the losetup -d
>  	 * command to fail with EBUSY.
>  	 */
> -	if (atomic_read(&lo->lo_refcnt) > 1) {
> +	if (lo->lo_disk->part0->bd_openers > 1) {

But bd_openers can be read safely only under disk->open_mutex. So for this
to be safe against compiler playing nasty tricks with optimizations, we
need to either make bd_openers atomic_t or use READ_ONCE / WRITE_ONCE when
accessing it.

> @@ -1724,33 +1724,15 @@ static int lo_compat_ioctl(struct block_device *bdev, fmode_t mode,
>  }
>  #endif
>  
> -static int lo_open(struct block_device *bdev, fmode_t mode)
> -{
> -	struct loop_device *lo = bdev->bd_disk->private_data;
> -	int err;
> -
> -	err = mutex_lock_killable(&lo->lo_mutex);
> -	if (err)
> -		return err;
> -	if (lo->lo_state == Lo_deleting)
> -		err = -ENXIO;
> -	else
> -		atomic_inc(&lo->lo_refcnt);
> -	mutex_unlock(&lo->lo_mutex);
> -	return err;
> -}
> -

Tetsuo has observed [1] that not grabbing lo_mutex when opening loop device
tends to break systemd-udevd because in loop_configure() we send
DISK_EVENT_MEDIA_CHANGE event before the device is fully configured (but
the configuration gets finished before we release the lo_mutex) and so
systemd-udev gets spurious IO errors when probing new loop devices and is
unhappy. So I think this is the right way to go but we need to reshuffle
loop_configure() a bit first.

[1] https://lore.kernel.org/all/a72c59c6-298b-e4ba-b1f5-2275afab49a1@xxxxxxxxxxxxxxxxxxx/T/#u

								Honza

-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR