Re: [PATCH] block: avoid to serialize blkdev_fallocate

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu 12-05-22 21:48:14, Ming Lei wrote:
> Commit f278eb3d8178 ("block: hold ->invalidate_lock in blkdev_fallocate")
> adds ->invalidate_lock in blkdev_fallocate() for preventing stale data
> from being read during fallocate().
> 
> However, the side effect is that blkdev_fallocate() becomes serialized
> since blkdev_fallocate() always blocks.
> 
> Add one atomic fallocate counter for allowing blkdev_fallocate() to
> be run concurrently so that discard/write_zero bios from different
> fallocate() can be submitted in parallel.
> 
> Cc: Jan Kara <jack@xxxxxxx>
> Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>

Couple of points:
* What about blk_ioctl_zeroout() or blk_ioctl_discard()?
* This way processes calling discard / zeroout can easily starve process
  wanting to do read which does not seem ideal. Somewhat related is that I
  know that Christoph wanted to modify how we use filemap_invalidate_lock()
  so that huge zeroouts performed by actually writing zeros will not block
  readers for ages and this seems to be going in somewhat opposite direction
  favoring writers even more.
* I suspect this is going to upset lockdep (and lock owner tracking for the
  purposes of spinning) pretty badly because the lock owner may be returning
  to userspace without unlocking the lock.

								Honza

> ---
>  block/fops.c              | 6 ++++--
>  include/linux/blk_types.h | 2 ++
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/block/fops.c b/block/fops.c
> index 9f2ecec406b0..368866b15e68 100644
> --- a/block/fops.c
> +++ b/block/fops.c
> @@ -651,7 +651,8 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
>  	if ((start | len) & (bdev_logical_block_size(bdev) - 1))
>  		return -EINVAL;
>  
> -	filemap_invalidate_lock(inode->i_mapping);
> +	if (atomic_inc_return(&bdev->bd_fallocate_count) == 1)
> +		filemap_invalidate_lock(inode->i_mapping);
>  
>  	/* Invalidate the page cache, including dirty pages. */
>  	error = truncate_bdev_range(bdev, file->f_mode, start, end);
> @@ -679,7 +680,8 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
>  	}
>  
>   fail:
> -	filemap_invalidate_unlock(inode->i_mapping);
> +	if (!atomic_dec_return(&bdev->bd_fallocate_count))
> +		filemap_invalidate_unlock(inode->i_mapping);
>  	return error;
>  }
>  
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index 1973ef9bd40f..9ccd841ea8ed 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -58,6 +58,8 @@ struct block_device {
>  	struct gendisk *	bd_disk;
>  	struct request_queue *	bd_queue;
>  
> +	atomic_t		bd_fallocate_count;
> +
>  	/* The counter of freeze processes */
>  	int			bd_fsfreeze_count;
>  	/* Mutex for freeze */
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux