On Wed, Sep 14, 2016 at 10:29:33PM -0700, Darrick J. Wong wrote: > I'm a little confused about xfs_file_iomap_begin() and IOMAP_WRITE -- > prior to this patchset, it was only called via page_mkwrite and what > used to be write_begin, and all it did was create a delalloc reservation > to back the write (or actually allocate blocks if extsize was set). No, it was called from write itself in addition, of course. > Thinking ahead to integrating reflink with DAX (and presumably > directio) in both cases _file_iomap_begin has to return a real extent. > If the range we're writing is shared, then that means that I'd have to > ensure the range is reserved in the COW fork (the reflink patches already do > this). Next, if the caller requires a firm mapping (i.e. dax or dio) > I'd have to allocate the range in the COW fork and have that mapping > returned to the caller. Yes. No different than the existing buffered write case with an extent size hint, really. > So I guess it would look something like this: > > /* reserve reflink cow range like the reflink patches already do */ > if (flags & (IOMAP_WRITE | IOMAP_ZERO) && xfs_is_reflink_inode(ip)) > xfs_reflink_reserve_cow_range(ip, offset, length); For zeroing we don't need to reserve anything IFF we are finding a hole, so this is too early. > /* do the cow thing if need be */ > if ((flags & IOMAP_WRITE) && > xfs_is_reflink_inode(ip) && > (IS_DAX(inode) || (flags & IOMAP_DIO)) { > xfs_reflink_allocate_cow_range(ip, offset, length); > > if (xfs_reflink_is_cow_pending(ip, offset)) > xfs_reflink_find_cow_mapping(ip, offset, &imap, &need_alloc); > else > xfs_bmapi_read(offset, &imap) > } else > xfs_bmapi_read(offset, &imap) /* non-cow io */ We really have three write block allocation variants, of which the first two look basically the same except for I/O completion handling: 1) For DAX we need to allocate real extents and zero them before use. 2) For buffered I/O with an extent size hint or direct I/O we need to allocate unwritten extents (and convert after the I/O succeeded for the range that succeeded) 3) for buffered I/O without and extent size hint we need to created a delayed allocation That logic exists both in the old __xfs_get_blocks and and the new iomap code (although we don't check for the dio case yet). So I don't think we should need to change anything in terms of COW in relation to DAX or direct I/O here. I do however need to fully audit the usage of the reflink calls for the cases 1 and 2. For one it doesn't really seem nessecary to split the reserve/allocate case ehre, and I hope that a lot of the bmap btree looks could be saved, similar to my rewrite of the delalloc case. > Does that look plausible? If _iomap_begin will have one type of caller > that only needs a delalloc reservation and another type that actually > needs a firm mapping, should we make a new IOMAP flag to indicate that > the caller really needs a firm mapping? IS_DAX is fine for detecting > the DAX case as this patchset shows, but what about directio? > > (At this point Dave suggests on IRC that we just make a dio specific > iomap_begin. That eliminates the flag, though it'd be a fairly similar > function.) As explained above, DIO is exaxtly the same as buffered I/O with an extent size hint. We already handle it fine. DAX is almost the same, keyed off a DAX check in lower level code to convert unwritten extents and zero the blocks. -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html