Re: [PATCH 3/6] xfs: Don't use unwritten extents for DAX

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Oct 30, 2015 at 08:36:57AM -0400, Brian Foster wrote:
> On Fri, Oct 30, 2015 at 10:37:56AM +1100, Dave Chinner wrote:
> > On Thu, Oct 29, 2015 at 10:29:50AM -0400, Brian Foster wrote:
> > > On Mon, Oct 19, 2015 at 02:27:15PM +1100, Dave Chinner wrote:
> > > > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > > > 
> ...
> > > > +	/*
> > > > +	 * For DAX, we do not allocate unwritten extents, but instead we zero
> > > > +	 * the block before we commit the transaction.  Ideally we'd like to do
> > > > +	 * this outside the transaction context, but if we commit and then crash
> > > > +	 * we may not have zeroed the blocks and this will be exposed on
> > > > +	 * recovery of the allocation. Hence we must zero before commit.
> > > > +	 * Further, if we are mapping unwritten extents here, we need to zero
> > > > +	 * and convert them to written so that we don't need an unwritten extent
> > > > +	 * callback for DAX. This also means that we need to be able to dip into
> > > > +	 * the reserve block pool if there is no space left but we need to do
> > > > +	 * unwritten extent conversion.
> > > > +	 */
> > > > +	if (IS_DAX(VFS_I(ip))) {
> > > > +		bmapi_flags = XFS_BMAPI_CONVERT | XFS_BMAPI_ZERO;
> > > > +		tp->t_flags |= XFS_TRANS_RESERVE;
> > > > +	}
> > > 
> > > Am I following the commit log description correctly in that block
> > > zeroing is only required for DAX faults? Do we zero blocks for DAX DIO
> > > as well to be consistent, or is that also required (because it looks
> > > like we still have end_io completion for dio writes anyways)?
> > 
> > DAX DIO will do the zeroing rather than using unwritten extents,
> > too. But we still have DIO IO completion as that needs to do file
> > size updates.
> > 
> 
> Right, my question is: is the DAX DIO zeroing required to avoid the
> races described as the purpose for this patch, or is this just here as a
> simplification? In other words, why not do block zeroing only for DAX
> faults and not DAX/DIO?

Because the only reason the DIO code does 'allocate unwritten;
convert unwritten on IO completion' is so that if we have:

	allocate
	trans_commit
	....				log force
					journal IO submit
	....				journal IO completion
	submit data io
	crash

We don't expose allocated blocks containing stale data to userspace
via recovery. The allcoation uses unwritten extents to ensure that
if the allocation is recovered without the correspending completion,
it reads as zeros rather whatever was previously on disk in taht
location.

For DAX, we can zero the blocks inside the allocation transaction
for direct IO, and hence even if we have the above happen, we'll
only ever expose zeros. Hence we don't need unwritten extents in the
DIO path to avoid stale data exposure, and so we can simply avoid
all that extra overhead of unwritten extent conversion on
completion...

> I ask because my understanding is the purpose of this patch is a special
> atomic zeroed allocation requirement just for mmap.

The requirement is set by DAX+mmap; the implementation is a generic
"allocate zeroed blocks" mechanism that can be applied to any
allocation that uses unwritten extents to allocate zeroed blocks if
zeroing is more efficient than using unwritten extents....

> Unless there is some
> special mixed dio/mmap case I'm missing, doing so for DAX/DIO basically
> causes a clear_pmem() over every page sized chunk of the target I/O
> range for which we already have the data.

I don't follow - this only zeros blocks when we do allocation of new
blocks or overwrite unwritten extents, not on blocks which we
already have written data extents allocated for...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs



[Index of Archives]     [Linux XFS Devel]     [Linux Filesystem Development]     [Filesystem Testing]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux