On Fri, Jun 21, 2019 at 02:28:24PM -0500, Goldwyn Rodrigues wrote: > From: Goldwyn Rodrigues <rgoldwyn@xxxxxxxx> Commit message needed here... > Signed-off-by: Goldwyn Rodrigues <rgoldwyn@xxxxxxxx> > --- > fs/iomap.c | 14 ++++++++------ > 1 file changed, 8 insertions(+), 6 deletions(-) > > diff --git a/fs/iomap.c b/fs/iomap.c > index 6648957af268..8a7b20e432ef 100644 > --- a/fs/iomap.c > +++ b/fs/iomap.c > @@ -655,7 +655,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, > > static int > iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags, > - struct page **pagep, struct iomap *iomap) > + struct page **pagep, struct iomap *iomap, struct iomap *srcmap) > { > const struct iomap_page_ops *page_ops = iomap->page_ops; > pgoff_t index = pos >> PAGE_SHIFT; > @@ -681,6 +681,8 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags, > > if (iomap->type == IOMAP_INLINE) > iomap_read_inline_data(inode, page, iomap); > + else if (iomap->type == IOMAP_COW) > + status = __iomap_write_begin(inode, pos, len, page, srcmap); Pardon my stream of consciousness while I try to reason about this change... Hmm. For writes to the page cache (which aren't necessarily aligned to a page granularity), this part of the iomap code has used whatever iomap the fs provides to read in whatever page contents are needed from disk so that we can do a read-modify-write through the page cache. For XFS this means that we (almost*) always report data fork extents in response to a write query, even if the write would be COW, because we know that we won't need the cow fork mapping until writeback. This has led to the sort of funny situation where an (IOMAP_WRITE|IOMAP_DIRECT) request will return the COW fork extent, but an (IOMAP_WRITE) request returns the data fork extent. (* "almost", because we will sometimes return a cow fork extent if the data fork is a hole and the file has extent size hints enabled. We're safe from reading in stale disk contents because cow fork extents do not achieve written status until writeback completes, and the page stays locked so we can't write and writeback it simultaneously) I /think/ this finally enables us to fix that weird quirk of the xfs iomap methods, because now we always report the write address and we always report the read address if the actor is supposed to do a read-modify-write. It's the actor's responsibilty to sort that out, not the ->iomap_begin function's. --D > else if (iomap->flags & IOMAP_F_BUFFER_HEAD) > status = __block_write_begin_int(page, pos, len, NULL, iomap); > else > @@ -833,7 +835,7 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data, > } > > status = iomap_write_begin(inode, pos, bytes, flags, &page, > - iomap); > + iomap, srcmap); > if (unlikely(status)) > break; > > @@ -932,7 +934,7 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data, > return PTR_ERR(rpage); > > status = iomap_write_begin(inode, pos, bytes, > - AOP_FLAG_NOFS, &page, iomap); > + AOP_FLAG_NOFS, &page, iomap, srcmap); > put_page(rpage); > if (unlikely(status)) > return status; > @@ -978,13 +980,13 @@ iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len, > EXPORT_SYMBOL_GPL(iomap_file_dirty); > > static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset, > - unsigned bytes, struct iomap *iomap) > + unsigned bytes, struct iomap *iomap, struct iomap *srcmap) > { > struct page *page; > int status; > > status = iomap_write_begin(inode, pos, bytes, AOP_FLAG_NOFS, &page, > - iomap); > + iomap, srcmap); > if (status) > return status; > > @@ -1022,7 +1024,7 @@ iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count, > if (IS_DAX(inode)) > status = iomap_dax_zero(pos, offset, bytes, iomap); > else > - status = iomap_zero(inode, pos, offset, bytes, iomap); > + status = iomap_zero(inode, pos, offset, bytes, iomap, srcmap); > if (status < 0) > return status; > > -- > 2.16.4 >