On Thu 12-05-22 21:48:14, Ming Lei wrote: > Commit f278eb3d8178 ("block: hold ->invalidate_lock in blkdev_fallocate") > adds ->invalidate_lock in blkdev_fallocate() for preventing stale data > from being read during fallocate(). > > However, the side effect is that blkdev_fallocate() becomes serialized > since blkdev_fallocate() always blocks. > > Add one atomic fallocate counter for allowing blkdev_fallocate() to > be run concurrently so that discard/write_zero bios from different > fallocate() can be submitted in parallel. > > Cc: Jan Kara <jack@xxxxxxx> > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> Couple of points: * What about blk_ioctl_zeroout() or blk_ioctl_discard()? * This way processes calling discard / zeroout can easily starve process wanting to do read which does not seem ideal. Somewhat related is that I know that Christoph wanted to modify how we use filemap_invalidate_lock() so that huge zeroouts performed by actually writing zeros will not block readers for ages and this seems to be going in somewhat opposite direction favoring writers even more. * I suspect this is going to upset lockdep (and lock owner tracking for the purposes of spinning) pretty badly because the lock owner may be returning to userspace without unlocking the lock. Honza > --- > block/fops.c | 6 ++++-- > include/linux/blk_types.h | 2 ++ > 2 files changed, 6 insertions(+), 2 deletions(-) > > diff --git a/block/fops.c b/block/fops.c > index 9f2ecec406b0..368866b15e68 100644 > --- a/block/fops.c > +++ b/block/fops.c > @@ -651,7 +651,8 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start, > if ((start | len) & (bdev_logical_block_size(bdev) - 1)) > return -EINVAL; > > - filemap_invalidate_lock(inode->i_mapping); > + if (atomic_inc_return(&bdev->bd_fallocate_count) == 1) > + filemap_invalidate_lock(inode->i_mapping); > > /* Invalidate the page cache, including dirty pages. */ > error = truncate_bdev_range(bdev, file->f_mode, start, end); > @@ -679,7 +680,8 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start, > } > > fail: > - filemap_invalidate_unlock(inode->i_mapping); > + if (!atomic_dec_return(&bdev->bd_fallocate_count)) > + filemap_invalidate_unlock(inode->i_mapping); > return error; > } > > diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h > index 1973ef9bd40f..9ccd841ea8ed 100644 > --- a/include/linux/blk_types.h > +++ b/include/linux/blk_types.h > @@ -58,6 +58,8 @@ struct block_device { > struct gendisk * bd_disk; > struct request_queue * bd_queue; > > + atomic_t bd_fallocate_count; > + > /* The counter of freeze processes */ > int bd_fsfreeze_count; > /* Mutex for freeze */ > -- > 2.31.1 > -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR