On Tue, Jun 25, 2019 at 02:14:42PM -0500, Goldwyn Rodrigues wrote: > On 9:07 24/06, Christoph Hellwig wrote: > > xfs will need to be updated to fill in the additional iomap for the > > COW case. Has this series been tested on xfs? > > > > No, I have not tested this, or make xfs set IOMAP_COW. I will try to do > it in the next iteration. AFAICT even if you did absolutely nothing XFS would continue to work properly because iomap_write_begin doesn't actually care if it's going to be a COW write because the only IO it does from the mapping is to read in the non-uptodate parts of the page if the write offset/len aren't page-aligned. > > I can't say I'm a huge fan of this two iomaps in one method call > > approach. I always though two separate iomap iterations would be nicer, > > but compared to that even the older hack with just the additional > > src_addr seems a little better. > > I am just expanding on your idea of using multiple iterations for the Cow case > in the hope we can come out of a good design: > > 1. iomap_file_buffered_write calls iomap_apply with IOMAP_WRITE flag. > which calls iomap_begin() for the respective filesystem. > 2. btrfs_iomap_begin() sets up iomap->type as IOMAP_COW and fills iomap > struct with read addr information. > 3. iomap_apply() conditionally for IOMAP_COW calls do_cow(new function) > and calls ops->iomap_begin() with flag IOMAP_COW_READ_DONE(new flag). Unless I'm misreading this, you don't need a do_cow() or IOMAP_COW_READ_DONE because the page state tracks that for you: iomap_write_begin calls ->iomap_begin to learn from where it should read data if the write is not aligned to a page and the page isn't uptodate. If it's IOMAP_COW then we learn from *srcmap instead of *iomap. (The write actor then dirties the page) fsync() or whatever The mm calls ->writepage. The filesystem grabs the new COW mapping, constructs a bio with the new mapping and dirty pages, and submits the bio. pagesize >= blocksize so we're always writing full blocks. The writeback bio completes and calls ->bio_endio, which is the filesystem's trigger to make the mapping changes permanent, update ondisk file size, etc. For direct writes that are not block-aligned, we just bounce the write to the page cache... ...so it's only dax_iomap_rw where we're going to have to do the COW ourselves. That's simple -- map both addresses, copy the regions before offset and after offset+len, then proceed with writing whatever userspace sent us. No need for the iomap code itself to get involved. > 4. btrfs_iomap_begin() fills up iomap structure with write information. > > Step 3 seems out of place because iomap_apply should be iomap.type agnostic. > Right? > Should we be adding another flag IOMAP_COW_DONE, just to figure out that > this is the "real" write for iomap_begin to fill iomap? > > If this is not how you imagined, could you elaborate on the dual iteration > sequence? --D > > > -- > Goldwyn