Re: [PATCH] xfs: properly serialise fallocate against AIO+DIO

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 28, 2019 at 09:19:08PM -0700, Darrick J. Wong wrote:
> On Tue, Oct 29, 2019 at 02:48:50PM +1100, Dave Chinner wrote:
> > --- a/fs/xfs/xfs_bmap_util.c
> > +++ b/fs/xfs/xfs_bmap_util.c
> > @@ -1040,6 +1040,7 @@ xfs_unmap_extent(
> >  	goto out_unlock;
> >  }
> >  
> > +/* Caller must first wait for the completion of any pending DIOs if required. */
> >  int
> >  xfs_flush_unmap_range(
> >  	struct xfs_inode	*ip,
> > @@ -1051,9 +1052,6 @@ xfs_flush_unmap_range(
> >  	xfs_off_t		rounding, start, end;
> >  	int			error;
> >  
> > -	/* wait for the completion of any pending DIOs */
> > -	inode_dio_wait(inode);
> 
> Does xfs_reflink_remap_prep still need this function to call inode_dio_wait
> before zapping the page cache prior to reflinking into an existing file?

No, because that is done in generic_remap_file_range_prep() after we
have locked the inodes and broken leases in
xfs_reflink_remap_prep().

> > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > index 525b29b99116..865543e41fb4 100644
> > --- a/fs/xfs/xfs_file.c
> > +++ b/fs/xfs/xfs_file.c
> > @@ -817,6 +817,36 @@ xfs_file_fallocate(
> >  	if (error)
> >  		goto out_unlock;
> >  
> > +	/*
> > +	 * Must wait for all AIO to complete before we continue as AIO can
> > +	 * change the file size on completion without holding any locks we
> > +	 * currently hold. We must do this first because AIO can update both
> > +	 * the on disk and in memory inode sizes, and the operations that follow
> > +	 * require the in-memory size to be fully up-to-date.
> > +	 */
> > +	inode_dio_wait(inode);
> > +
> > +	/*
> > +	 * Now AIO and DIO has drained we flush and (if necessary) invalidate
> > +	 * the cached range over the first operation we are about to run.
> > +	 *
> > +	 * We care about zero and collapse here because they both run a hole
> > +	 * punch over the range first. Because that can zero data, and the range
> > +	 * of invalidation for the shift operations is much larger, we still do
> > +	 * the required flush for collapse in xfs_prepare_shift().
> > +	 *
> > +	 * Insert has the same range requirements as collapse, and we extend the
> > +	 * file first which can zero data. Hence insert has the same
> > +	 * flush/invalidate requirements as collapse and so they are both
> > +	 * handled at the right time by xfs_prepare_shift().
> > +	 */
> > +	if (mode & (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_ZERO_RANGE |
> > +		    FALLOC_FL_COLLAPSE_RANGE)) {
> 
> Er... "Insert has the same requirements as collapse", but we don't test
> for that here?  Also ... xfs_prepare_shift handles flushing for both
> collapse and insert range, but we still have to flush here for collapse?
>
> <confused but suspecting this has something to do with the fact that we
> only do insert range after updating the isize?>

Yes, exactly.

The flush for collapse here is for the hole punch part of collapse,
before we start shifting extents. insert does not hole punch, so it
doesn't need flushing here but it still needs flush/inval before
shifting. i.e.:

collapse				insert

flush_unmap(off, len)
punch hole(off, len)			extends EOF
  writes zeros around (off,len)		  writes zeros around EOF
collapse(off, len)			insert(off, len)
  flush_unmap(off, EOF)			  flush_unmap(off, EOF)
  shift extents down			  shift extents up

So once we start the actual extent shift operation (up or down)
the flush/unmap requirements are identical.

> I think the third paragraph of the comment is just confusing me more.
> Does the following describe what's going on?
> 
> "Insert range has the same range [should this be "page cache flushing"?]
> requirements as collapse.  Because we can zero data as part of extending
> the file size, we skip the flush here and let the flush in
> xfs_prepare_shift take care of invalidating the page cache." ?

It's a bit better - that's kinda what I was trying to describe - but
I'll try to reword it more clearly after I've let it settle in my
head for a little while....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux