Re: [PATCH] bdev: Do not return EBUSY if bdev discard races with write

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2021-01-07 at 17:48 +0200, Maxim Levitsky wrote:
> On Thu, 2021-01-07 at 16:40 +0100, Jan Kara wrote:
> > blkdev_fallocate() tries to detect whether a discard raced with an
> > overlapping write by calling invalidate_inode_pages2_range(). However
> > this check can give both false negatives (when writing using direct IO
> > or when writeback already writes out the written pagecache range) and
> > false positives (when write is not actually overlapping but ends in the
> > same page when blocksize < pagesize). This actually causes issues for
> > qemu which is getting confused by EBUSY errors.
> > 
> > Fix the problem by removing this conflicting write detection since it is
> > inherently racy and thus of little use anyway.
> > 
> > Reported-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx>
> > CC: "Darrick J. Wong" <darrick.wong@xxxxxxxxxx>
> > Link: https://lore.kernel.org/qemu-devel/20201111153913.41840-1-mlevitsk@xxxxxxxxxx
> > Signed-off-by: Jan Kara <jack@xxxxxxx>
> > ---
> >  fs/block_dev.c | 10 ++++------
> >  1 file changed, 4 insertions(+), 6 deletions(-)
> > 
> > diff --git a/fs/block_dev.c b/fs/block_dev.c
> > index 3e5b02f6606c..a97f43b49839 100644
> > --- a/fs/block_dev.c
> > +++ b/fs/block_dev.c
> > @@ -1797,13 +1797,11 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
> >  		return error;
> >  
> >  	/*
> > -	 * Invalidate again; if someone wandered in and dirtied a page,
> > -	 * the caller will be given -EBUSY.  The third argument is
> > -	 * inclusive, so the rounding here is safe.
> > +	 * Invalidate the page cache again; if someone wandered in and dirtied
> > +	 * a page, we just discard it - userspace has no way of knowing whether
> > +	 * the write happened before or after discard completing...
> >  	 */
> > -	return invalidate_inode_pages2_range(bdev->bd_inode->i_mapping,
> > -					     start >> PAGE_SHIFT,
> > -					     end >> PAGE_SHIFT);
> > +	return truncate_bdev_range(bdev, file->f_mode, start, end);
> >  }
> 
> But what happens if write and discard don't overlap? Won't we
> discard the written data in this case?

Ah, I see, the truncate_bdev_range preserves the partial
areas that are not included in the range.

In this case this indeed looks right.

Reviewed-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx>

Best regards,
	Maxim Levitsky


> 
> 
> Best regards,
> 	Maxim Levitsky
> 
> 
> >  
> >  const struct file_operations def_blk_fops = {





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux