On 10:06 02/04, Dave Chinner wrote: > On Mon, Apr 01, 2019 at 04:41:02PM -0500, Goldwyn Rodrigues wrote: > > On 15:38 01/04, Dave Chinner wrote: > > > On Tue, Mar 26, 2019 at 02:02:50PM -0500, Goldwyn Rodrigues wrote: > > > > From: Goldwyn Rodrigues <rgoldwyn@xxxxxxxx> > > > > > > > > The IOMAP_F_COW is a flag to notify dax that it needs to copy > > > > the data from iomap->cow_addr to iomap->addr, if the start/end > > > > of I/O are not page aligned. > > > > > > I see what you are trying to do here, but this is kinda gross. > > > > > > > This also introduces dax_to_dax_copy() which performs a copy > > > > from one part of the device to another, to a maximum of one page. > > > > > > > > Question: Using iomap.cow_addr == 0 means the CoW is to be copied > > > > (or memset) from a hole. Would this be better handled through a flag? > > > > > > That's what all these checks in the iomap code do: > > > > > > > This is using iomap->flags not type. > > Yes, I know. The fact that you tell me this (when it was obvious) > indicates to me that you didn't understand what I was saying. > > i.e. the gross hack here is that this patch is trying to define a > new iomap type - both behaviourally and iomap content - via adding > a modifier flag rather than defining a new iomap->type. That's the > gross hack, and everything stems from that. > > i.e. the "bloating" of the struct iomap is caused because the flag > modifier (IOMAP_F_COW) can't use parts of the iomap that are defined > for specific iomap types. e.g. IOMAP_INLINE type uses ->inline_data, > and so it can't be re-used by a iomap flag modifier such as > IOMAP_F_COW. > > However, if we define a new type for this "need multiple mappings" > iomap rather than a flag, we don't need any new fields in the struct > iomap because we can use what already exists in the iomap. > > > > if (iomap->type == IOMAP_HOLE || iomap->type == IOMAP_UNWRITTEN) > > > > > > Oh, wait, you're trying to map two ranges in a single iomap and then > > > infer state from information that got chucked away.... IOWs, you're > > > doing it wrong - iomap algorithms are driven by how we manipulate > > > iomaps to do data operations efficiently, not how we copy data page > > > by page. > > > > > > IOWs, what we really should have here is two iomaps - a source > > > and a destination iomap. The source is a read mapping of the > > > current address (where we are going to copy the data from), the > > > destination is the post-cow allocation mapping (where the data > > > goes). > > > > > > Now you just copy the data from one map to the other iterating > > > source mappings until the necessary range of the destination has > > > been covered. And you can check if the source map is IOMAP_HOLE or > > > IOMAP_UNWRITTEN and hence optimise the copy (i.e. zero the new > > > allocation) before copying in the new data. > > > > Won't that be inefficient? With CoW we only need to write the first > > and last block. > > You're assuming that partial data overwrites are the only case where > this dax-to-dax copy of a file range is required. That assumption is > false. > > i.e. FALLOC_FL_UNSHARE_RANGE for DAX requires iterating over the > entire source range and copying all the contents to the newly > allocated destination range. The partial block copying is just a > short version of this limited to a single block. > > Sure, btrfs doesn't support FALLOC_FL_UNSHARE_RANGE, but if you're > going to be adding support for reflink to DAX, the infrastructure > needs to provide support for performing FALLOC_FL_UNSHARE_RANGE > to break extent sharing efficiently. > > > Again, that is not required if the offset/end > > offset is block aligned. After that, it falls back to the > > regular write path of performing dax_copy_to_iter(). > > We don't deal with IOMAP_UNWRITTEN in dax, > > Yes we do. fallocate() can lay down unwritten extents, and we can > both read and write to them. See, for example, dax_iomap_actor() > called from dax_iomap_rw() via iomap_apply() - it does not call > dax_copy_to_iter() for reads if the range is IOMAP_HOLE or > IOMAP_UNWRITTEN. > > > though other > > filesystems in the future may use CoW without dax. > > That makes no sense. :/ > > > Besides, what you are suggesting will not fit well with the > > current iomap iterator code and would require another function > > altogether. > > I'm not concerned about that - I would much prefer we do things > cleanly and properly rather than make expedient hacks for > questionable benefit that we'll have to completely rework or remove > the moment we implement DAX+reflink support in XFS. > > > After Darrick's suggestion, we can even do away with cow_pos, so > > only the read address of cow_addr will exist. > > As I mentioned earlier, even that is not necessary. > > This is DAX - the iomap API and mapping functions can already return > pointers to inline data, and DAX can effectively be considered > inline data for the purposes of reading data. > > As I said, the problem here is you are trying to use flags to define > a new type of iomap operation requires two mappings rather than one. > IMO, we should be defining an IOMAP_DAX_COW /type/ and then define > it to contain and behave as follows: > > - new destination region for data to be copied into is the > same setup as IOMAP_MAPPED > - existing shared data that may be needed for reading is > mapped direct to device address by ->iomap_begin into > iomap->inline_data > - if the iomap infrastructure needs to copy original source > data into destination, it copies directly from the memory > address in iomap->inline_data into the directly mapped DAX > desitination via memcpy(). > > This covers both the partial write COW case you are concerned with > here, and the full-extent range copy case that > FALLOC_FL_UNSHARE_RANGE requires. > Ok, understood. However, we may have to differentiate between a CoW with FALLOC_FL_UNSHARE_RANGE because CoW will only (conditionally) copy the first and the last block wheras FALLOC_FL_UNSHARE_RANGE will copy the whole range. So, should it be another type? Also, this would require iomap_begin to translate block->dax address which I suppose can be a function in the dax code. -- Goldwyn