On Wed, May 29, 2024 at 05:51:59PM +0800, Zhang Yi wrote: > XXX: how do we detect a iomap containing a cow mapping over a hole > in iomap_zero_iter()? The XFS code implies this case also needs to > zero the page cache if there is data present, so trigger for page > cache lookup only in iomap_zero_iter() needs to handle this case as > well. If there is no data in the page cache and either a whole or unwritten extent it really should not matter what is in the COW fork, a there obviously isn't any data we could zero. If there is data in the page cache for something that is marked as a hole in the srcmap, but we have data in the COW fork due to COW extsize preallocation we'd need to zero it, but as the xfs iomap ops don't return a separate srcmap for that case we should be fine. Or am I missing something? > + * Note: when zeroing unwritten extents, we might have data in the page cache > + * over an unwritten extent. In this case, we want to do a pure lookup on the > + * page cache and not create a new folio as we don't need to perform zeroing on > + * unwritten extents if there is no cached data over the given range. > */ > struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos, size_t len) > { > fgf_t fgp = FGP_WRITEBEGIN | FGP_NOFS; > > + if (iter->flags & IOMAP_ZERO) { > + const struct iomap *srcmap = iomap_iter_srcmap(iter); > + > + if (srcmap->type == IOMAP_UNWRITTEN) > + fgp &= ~FGP_CREAT; > + } Nit: The comment would probably stand out a little better if it was right next to the IOMAP_ZERO conditional instead of above the function. > + if (status) { > + if (status == -ENOENT) { > + /* > + * Unwritten extents need to have page cache > + * lookups done to determine if they have data > + * over them that needs zeroing. If there is no > + * data, we'll get -ENOENT returned here, so we > + * can just skip over this index. > + */ > + WARN_ON_ONCE(srcmap->type != IOMAP_UNWRITTEN); I'd return -EIO if the WARN_ON triggers. > +loop_continue: While I'm no strange to gotos for loop control something trips me up about jumping to the end of the loop. Here is what I could come up with instead. Not arguing it's objectively better, but I somehow like it a little better: diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 700b22d6807783..81378f7cd8d7ff 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1412,49 +1412,56 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero) bool ret; status = iomap_write_begin(iter, pos, bytes, &folio); - if (status) { - if (status == -ENOENT) { - /* - * Unwritten extents need to have page cache - * lookups done to determine if they have data - * over them that needs zeroing. If there is no - * data, we'll get -ENOENT returned here, so we - * can just skip over this index. - */ - WARN_ON_ONCE(srcmap->type != IOMAP_UNWRITTEN); - if (bytes > PAGE_SIZE - offset_in_page(pos)) - bytes = PAGE_SIZE - offset_in_page(pos); - goto loop_continue; - } + if (status && status != -ENOENT) return status; - } - if (iter->iomap.flags & IOMAP_F_STALE) - break; - offset = offset_in_folio(folio, pos); - if (bytes > folio_size(folio) - offset) - bytes = folio_size(folio) - offset; + if (status == -ENOENT) { + /* + * If we end up here, we did not find a folio in the + * page cache for an unwritten extent and thus can + * skip over the range. + */ + if (WARN_ON_ONCE(srcmap->type != IOMAP_UNWRITTEN)) + return -EIO; - /* - * If the folio over an unwritten extent is clean (i.e. because - * it has been read from), then it already contains zeros. Hence - * we can just skip it. - */ - if (srcmap->type == IOMAP_UNWRITTEN && - !folio_test_dirty(folio)) { - folio_unlock(folio); - goto loop_continue; + /* + * XXX: It would be nice if we could get the offset of + * the next entry in the pagecache so that we don't have + * to iterate one page at a time here. + */ + offset = offset_in_page(pos); + if (bytes > PAGE_SIZE - offset) + bytes = PAGE_SIZE - offset; + } else { + if (iter->iomap.flags & IOMAP_F_STALE) + break; + + offset = offset_in_folio(folio, pos); + if (bytes > folio_size(folio) - offset) + bytes = folio_size(folio) - offset; + + /* + * If the folio over an unwritten extent is clean (i.e. + * because it has only been read from), then it already + * contains zeros. Hence we can just skip it. + */ + if (srcmap->type == IOMAP_UNWRITTEN && + !folio_test_dirty(folio)) { + folio_unlock(folio); + status = -ENOENT; + } } - folio_zero_range(folio, offset, bytes); - folio_mark_accessed(folio); + if (status != -ENOENT) { + folio_zero_range(folio, offset, bytes); + folio_mark_accessed(folio); - ret = iomap_write_end(iter, pos, bytes, bytes, folio); - __iomap_put_folio(iter, pos, bytes, folio); - if (WARN_ON_ONCE(!ret)) - return -EIO; + ret = iomap_write_end(iter, pos, bytes, bytes, folio); + __iomap_put_folio(iter, pos, bytes, folio); + if (WARN_ON_ONCE(!ret)) + return -EIO; + } -loop_continue: pos += bytes; length -= bytes; written += bytes;