On Thu, Aug 29, 2024 at 10:26:06AM +1000, Dave Chinner wrote: > On Wed, Aug 28, 2024 at 03:44:20PM -0700, Darrick J. Wong wrote: > > On Wed, Aug 28, 2024 at 02:19:11PM -0400, Brian Foster wrote: > > > @@ -1450,19 +1481,27 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, > > > .flags = IOMAP_ZERO, > > > }; > > > int ret; > > > + bool range_dirty; > > > > > > /* > > > * Zero range wants to skip pre-zeroed (i.e. unwritten) mappings, but > > > * pagecache must be flushed to ensure stale data from previous > > > - * buffered writes is not exposed. > > > + * buffered writes is not exposed. A flush is only required for certain > > > + * types of mappings, but checking pagecache after mapping lookup is > > > + * racy with writeback and reclaim. > > > + * > > > + * Therefore, check the entire range first and pass along whether any > > > + * part of it is dirty. If so and an underlying mapping warrants it, > > > + * flush the cache at that point. This trades off the occasional false > > > + * positive (and spurious flush, if the dirty data and mapping don't > > > + * happen to overlap) for simplicity in handling a relatively uncommon > > > + * situation. > > > */ > > > - ret = filemap_write_and_wait_range(inode->i_mapping, > > > - pos, pos + len - 1); > > > - if (ret) > > > - return ret; > > > + range_dirty = filemap_range_needs_writeback(inode->i_mapping, > > > + pos, pos + len - 1); > > > > > > while ((ret = iomap_iter(&iter, ops)) > 0) > > > - iter.processed = iomap_zero_iter(&iter, did_zero); > > > + iter.processed = iomap_zero_iter(&iter, did_zero, &range_dirty); > > > > Style nit: Could we do this flush-and-stale from the loop body instead > > of passing pointers around? e.g. > > > > static inline bool iomap_zero_need_flush(const struct iomap_iter *i) > > { > > const struct iomap *srcmap = iomap_iter_srcmap(iter); > > > > return srcmap->type == IOMAP_HOLE || > > srcmap->type == IOMAP_UNWRITTEN; > > } > > > > static inline int iomap_zero_iter_flush(struct iomap_iter *i) > > { > > struct address_space *mapping = i->inode->i_mapping; > > loff_t end = i->pos + i->len - 1; > > > > i->iomap.flags |= IOMAP_F_STALE; > > return filemap_write_and_wait_range(mapping, i->pos, end); > > } > > > > and then: > > > > range_dirty = filemap_range_needs_writeback(...); > > > > while ((ret = iomap_iter(&iter, ops)) > 0) { > > if (range_dirty && iomap_zero_need_flush(&iter)) { > > /* > > * Zero range wants to skip pre-zeroed (i.e. > > * unwritten) mappings, but... > > */ > > range_dirty = false; > > iter.processed = iomap_zero_iter_flush(&iter); > > } else { > > iter.processed = iomap_zero_iter(&iter, did_zero); > > } > > } > > > > The logic looks correct and sensible. :) > > Yeah, I think this is better. > > However, the one thing that both versions have in common is that > they don't explain -why- the iomap needs to be marked stale. > So, something like: > > "When we flush the dirty data over the range, the extent state for > the range will change. We need to to know that new state before > performing any zeroing operations on the range. Hence we mark the > iomap stale so that the iterator will remap this range and the next > ieration pass will see the new extent state and perform the correct > zeroing operation for the range." > Sure, I'll update the comments however the factoring ultimately turns out. Thanks. Brian > -Dave. > > -- > Dave Chinner > david@xxxxxxxxxxxxx >