On Mon, Apr 01, 2019 at 03:38:52PM +1100, Dave Chinner wrote: > On Tue, Mar 26, 2019 at 02:02:50PM -0500, Goldwyn Rodrigues wrote: > > From: Goldwyn Rodrigues <rgoldwyn@xxxxxxxx> > > > > The IOMAP_F_COW is a flag to notify dax that it needs to copy > > the data from iomap->cow_addr to iomap->addr, if the start/end > > of I/O are not page aligned. > > I see what you are trying to do here, but this is kinda gross. Even more gross than our current hack where for buffered I/O we return the previous address but still allocate delalloc blocks for the COW? ;-) This whole area is a mine field.. > Oh, wait, you're trying to map two ranges in a single iomap and then > infer state from information that got chucked away.... IOWs, you're > doing it wrong - iomap algorithms are driven by how we manipulate > iomaps to do data operations efficiently, not how we copy data page > by page. > > IOWs, what we really should have here is two iomaps - a source > and a destination iomap. The source is a read mapping of the > current address (where we are going to copy the data from), the > destination is the post-cow allocation mapping (where the data > goes). > > Now you just copy the data from one map to the other iterating > source mappings until the necessary range of the destination has > been covered. And you can check if the source map is IOMAP_HOLE or > IOMAP_UNWRITTEN and hence optimise the copy (i.e. zero the new > allocation) before copying in the new data. > > Even better, only do the source mapping if the write isn't > page/filesystem block aligned, and hence only do the slow path > copying if a source mapping exists.... I looked into the do two maps instead of the mentioned above existing gross hack when doing the buffer_head removal for iomap. The major issue with it is that we need a way for the second recursive mapping to not take the inode-wide locks. I still think it is worth it (or move to per-fork or range locks) as it defintively is the much cleaner way to handle it. I just didn't bother to go there yet as I wanted the buffer_head removal and thus kept the existing gross hack for now. But moving to a two mapping scheme for read-modify-write ops would be a major improvement for sure.