Re: [PATCH 1/2] block: handle BLK_OPEN_RESTRICT_WRITES correctly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat 23-03-24 17:11:19, Christian Brauner wrote:
> Last kernel release we introduce CONFIG_BLK_DEV_WRITE_MOUNTED. By
> default this option is set. When it is set the long-standing behavior
> of being able to write to mounted block devices is enabled.
> 
> But in order to guard against unintended corruption by writing to the
> block device buffer cache CONFIG_BLK_DEV_WRITE_MOUNTED can be turned
> off. In that case it isn't possible to write to mounted block devices
> anymore.
> 
> A filesystem may open its block devices with BLK_OPEN_RESTRICT_WRITES
> which disallows concurrent BLK_OPEN_WRITE access. When we still had the
> bdev handle around we could recognize BLK_OPEN_RESTRICT_WRITES because
> the mode was passed around. Since we managed to get rid of the bdev
> handle we changed that logic to recognize BLK_OPEN_RESTRICT_WRITES based
> on whether the file was opened writable and writes to that block device
> are blocked. That logic doesn't work because we do allow
> BLK_OPEN_RESTRICT_WRITES to be specified without BLK_OPEN_WRITE.
> 
> So fix the detection logic. Use O_EXCL as an indicator that
> BLK_OPEN_RESTRICT_WRITES has been requested. We do the exact same thing
> for pidfds where O_EXCL means that this is a pidfd that refers to a
> thread. For userspace open paths O_EXCL will never be retained but for
> internal opens where we open files that are never installed into a file
> descriptor table this is fine.
> 
> Note that BLK_OPEN_RESTRICT_WRITES is an internal only flag that cannot
> directly be raised by userspace. It is implicitly raised during
> mounting.
> 
> Passes xftests and blktests with CONFIG_BLK_DEV_WRITE_MOUNTED set and
> unset.
> 
> Fixes: 321de651fa56 ("block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access")
> Reported-by: Matthew Wilcox <willy@xxxxxxxxxxxxx>
> Link: https://lore.kernel.org/r/ZfyyEwu9Uq5Pgb94@xxxxxxxxxxxxxxxxxxxx
> Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx>

The fix looks correct but admittedly it looks a bit hacky. I'd prefer
storing the needed information in some other flag, preferably one that does
not already have a special meaning with block devices. But FMODE_ space is
exhausted and don't see another easy solution. So I guess:

Reviewed-by: Jan Kara <jack@xxxxxxx>

Thanks for looking into this!

								Honza

> ---
>  block/bdev.c | 20 +++++++++++++-------
>  1 file changed, 13 insertions(+), 7 deletions(-)
> 
> diff --git a/block/bdev.c b/block/bdev.c
> index 7a5f611c3d2e..f819f3086905 100644
> --- a/block/bdev.c
> +++ b/block/bdev.c
> @@ -821,13 +821,12 @@ static void bdev_yield_write_access(struct file *bdev_file)
>  		return;
>  
>  	bdev = file_bdev(bdev_file);
> -	/* Yield exclusive or shared write access. */
> -	if (bdev_file->f_mode & FMODE_WRITE) {
> -		if (bdev_writes_blocked(bdev))
> -			bdev_unblock_writes(bdev);
> -		else
> -			bdev->bd_writers--;
> -	}
> +
> +	/* O_EXCL is only set for internal BLK_OPEN_RESTRICT_WRITES. */
> +	if (bdev_file->f_flags & O_EXCL)
> +		bdev_unblock_writes(bdev);
> +	else if (bdev_file->f_mode & FMODE_WRITE)
> +		bdev->bd_writers--;
>  }
>  
>  /**
> @@ -946,6 +945,13 @@ static unsigned blk_to_file_flags(blk_mode_t mode)
>  	else
>  		WARN_ON_ONCE(true);
>  
> +	/*
> +	 * BLK_OPEN_RESTRICT_WRITES is never set from userspace and
> +	 * O_EXCL is stripped from userspace.
> +	 */
> +	if (mode & BLK_OPEN_RESTRICT_WRITES)
> +		flags |= O_EXCL;
> +
>  	if (mode & BLK_OPEN_NDELAY)
>  		flags |= O_NDELAY;
>  
> -- 
> 2.43.0
> 
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux