On 11:00 26/06, Darrick J. Wong wrote: > On Tue, Jun 25, 2019 at 02:14:42PM -0500, Goldwyn Rodrigues wrote: > > On 9:07 24/06, Christoph Hellwig wrote: > > > xfs will need to be updated to fill in the additional iomap for the > > > COW case. Has this series been tested on xfs? > > > > > > > No, I have not tested this, or make xfs set IOMAP_COW. I will try to do > > it in the next iteration. > > AFAICT even if you did absolutely nothing XFS would continue to work > properly because iomap_write_begin doesn't actually care if it's going > to be a COW write because the only IO it does from the mapping is to > read in the non-uptodate parts of the page if the write offset/len > aren't page-aligned. > > > > I can't say I'm a huge fan of this two iomaps in one method call > > > approach. I always though two separate iomap iterations would be nicer, > > > but compared to that even the older hack with just the additional > > > src_addr seems a little better. > > > > I am just expanding on your idea of using multiple iterations for the Cow case > > in the hope we can come out of a good design: > > > > 1. iomap_file_buffered_write calls iomap_apply with IOMAP_WRITE flag. > > which calls iomap_begin() for the respective filesystem. > > 2. btrfs_iomap_begin() sets up iomap->type as IOMAP_COW and fills iomap > > struct with read addr information. > > 3. iomap_apply() conditionally for IOMAP_COW calls do_cow(new function) > > and calls ops->iomap_begin() with flag IOMAP_COW_READ_DONE(new flag). > > Unless I'm misreading this, you don't need a do_cow() or > IOMAP_COW_READ_DONE because the page state tracks that for you: > > iomap_write_begin calls ->iomap_begin to learn from where it should read > data if the write is not aligned to a page and the page isn't uptodate. > If it's IOMAP_COW then we learn from *srcmap instead of *iomap. > We were discussing the single iomap structure (without srcmap), but with two iterations, the first to get the read information and the second to get the write information for CoW. > (The write actor then dirties the page) > > fsync() or whatever > > The mm calls ->writepage. The filesystem grabs the new COW mapping, > constructs a bio with the new mapping and dirty pages, and submits the > bio. pagesize >= blocksize so we're always writing full blocks. > > The writeback bio completes and calls ->bio_endio, which is the > filesystem's trigger to make the mapping changes permanent, update > ondisk file size, etc. > > For direct writes that are not block-aligned, we just bounce the write > to the page cache... > > ...so it's only dax_iomap_rw where we're going to have to do the COW > ourselves. That's simple -- map both addresses, copy the regions before > offset and after offset+len, then proceed with writing whatever > userspace sent us. No need for the iomap code itself to get involved. Yes, this is how the latest patch series is working, with two iomap structures in iomap_begin() function parameters. -- Goldwyn