Re: [PATCH 04/15] dax: Introduce IOMAP_F_COW for copy-on-write

Christoph Hellwig <hch@xxxxxxxxxxxxx> · Sun, 7 Apr 2019 00:26:41 -0700

On Mon, Apr 01, 2019 at 03:38:52PM +1100, Dave Chinner wrote:
> On Tue, Mar 26, 2019 at 02:02:50PM -0500, Goldwyn Rodrigues wrote:
> > From: Goldwyn Rodrigues <rgoldwyn@xxxxxxxx>
> > 
> > The IOMAP_F_COW is a flag to notify dax that it needs to copy
> > the data from iomap->cow_addr to iomap->addr, if the start/end
> > of I/O are not page aligned.
> 
> I see what you are trying to do here, but this is kinda gross.

Even more gross than our current hack where for buffered I/O
we return the previous address but still allocate delalloc blocks
for the COW? ;-)

This whole area is a mine field..

> Oh, wait, you're trying to map two ranges in a single iomap and then
> infer state from information that got chucked away.... IOWs, you're
> doing it wrong - iomap algorithms are driven by how we manipulate
> iomaps to do data operations efficiently, not how we copy data page
> by page.
> 
> IOWs, what we really should have here is two iomaps - a source
> and a destination iomap. The source is a read mapping of the
> current address (where we are going to copy the data from), the
> destination is the post-cow allocation mapping (where the data
> goes).
> 
> Now you just copy the data from one map to the other iterating
> source mappings until the necessary range of the destination has
> been covered.  And you can check if the source map is IOMAP_HOLE or
> IOMAP_UNWRITTEN and hence optimise the copy (i.e. zero the new
> allocation) before copying in the new data.
> 
> Even better, only do the source mapping if the write isn't
> page/filesystem block aligned, and hence only do the slow path
> copying if a source mapping exists....

I looked into the do two maps instead of the mentioned above existing
gross hack when doing the buffer_head removal for iomap.  The major
issue with it is that we need a way for the second recursive mapping
to not take the inode-wide locks.  I still think it is worth it
(or move to per-fork or range locks) as it defintively is the much
cleaner way to handle it.  I just didn't bother to go there yet
as I wanted the buffer_head removal and thus kept the existing gross
hack for now.  But moving to a two mapping scheme for read-modify-write
ops would be a major improvement for sure.