On Fri, Oct 30, 2015 at 08:36:57AM -0400, Brian Foster wrote: > On Fri, Oct 30, 2015 at 10:37:56AM +1100, Dave Chinner wrote: > > On Thu, Oct 29, 2015 at 10:29:50AM -0400, Brian Foster wrote: > > > On Mon, Oct 19, 2015 at 02:27:15PM +1100, Dave Chinner wrote: > > > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > > ... > > > > + /* > > > > + * For DAX, we do not allocate unwritten extents, but instead we zero > > > > + * the block before we commit the transaction. Ideally we'd like to do > > > > + * this outside the transaction context, but if we commit and then crash > > > > + * we may not have zeroed the blocks and this will be exposed on > > > > + * recovery of the allocation. Hence we must zero before commit. > > > > + * Further, if we are mapping unwritten extents here, we need to zero > > > > + * and convert them to written so that we don't need an unwritten extent > > > > + * callback for DAX. This also means that we need to be able to dip into > > > > + * the reserve block pool if there is no space left but we need to do > > > > + * unwritten extent conversion. > > > > + */ > > > > + if (IS_DAX(VFS_I(ip))) { > > > > + bmapi_flags = XFS_BMAPI_CONVERT | XFS_BMAPI_ZERO; > > > > + tp->t_flags |= XFS_TRANS_RESERVE; > > > > + } > > > > > > Am I following the commit log description correctly in that block > > > zeroing is only required for DAX faults? Do we zero blocks for DAX DIO > > > as well to be consistent, or is that also required (because it looks > > > like we still have end_io completion for dio writes anyways)? > > > > DAX DIO will do the zeroing rather than using unwritten extents, > > too. But we still have DIO IO completion as that needs to do file > > size updates. > > > > Right, my question is: is the DAX DIO zeroing required to avoid the > races described as the purpose for this patch, or is this just here as a > simplification? In other words, why not do block zeroing only for DAX > faults and not DAX/DIO? Because the only reason the DIO code does 'allocate unwritten; convert unwritten on IO completion' is so that if we have: allocate trans_commit .... log force journal IO submit .... journal IO completion submit data io crash We don't expose allocated blocks containing stale data to userspace via recovery. The allcoation uses unwritten extents to ensure that if the allocation is recovered without the correspending completion, it reads as zeros rather whatever was previously on disk in taht location. For DAX, we can zero the blocks inside the allocation transaction for direct IO, and hence even if we have the above happen, we'll only ever expose zeros. Hence we don't need unwritten extents in the DIO path to avoid stale data exposure, and so we can simply avoid all that extra overhead of unwritten extent conversion on completion... > I ask because my understanding is the purpose of this patch is a special > atomic zeroed allocation requirement just for mmap. The requirement is set by DAX+mmap; the implementation is a generic "allocate zeroed blocks" mechanism that can be applied to any allocation that uses unwritten extents to allocate zeroed blocks if zeroing is more efficient than using unwritten extents.... > Unless there is some > special mixed dio/mmap case I'm missing, doing so for DAX/DIO basically > causes a clear_pmem() over every page sized chunk of the target I/O > range for which we already have the data. I don't follow - this only zeros blocks when we do allocation of new blocks or overwrite unwritten extents, not on blocks which we already have written data extents allocated for... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs