Re: [PATCH v2 2/2] iomap: make zero range flush conditional on unwritten mappings

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 28, 2024 at 02:19:11PM -0400, Brian Foster wrote:
> iomap_zero_range() flushes pagecache to mitigate consistency
> problems with dirty pagecache and unwritten mappings. The flush is
> unconditional over the entire range because checking pagecache state
> after mapping lookup is racy with writeback and reclaim. There are
> ways around this using iomap's mapping revalidation mechanism, but
> this is not supported by all iomap based filesystems and so is not a
> generic solution.
> 
> There is another way around this limitation that is good enough to
> filter the flush for most cases in practice. If we check for dirty
> pagecache over the target range (instead of unconditionally flush),
> we can keep track of whether the range was dirty before lookup and
> defer the flush until/unless we see a combination of dirty cache
> backed by an unwritten mapping. We don't necessarily know whether
> the dirty cache was backed by the unwritten maping or some other
> (written) part of the range, but the impliciation of a false
> positive here is a spurious flush and thus relatively harmless.
> 
> Note that we also flush for hole mappings because iomap_zero_range()
> is used for partial folio zeroing in some cases. For example, if a
> folio straddles EOF on a sub-page FSB size fs, the post-eof portion
> is hole-backed and dirtied/written via mapped write, and then i_size
> increases before writeback can occur (which otherwise zeroes the
> post-eof portion of the EOF folio), then the folio becomes
> inconsistent with disk until reclaimed. A flush in this case
> executes partial zeroing from writeback, and iomap knows that there
> is otherwise no I/O to submit for hole backed mappings.
> 
> Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> ---
>  fs/iomap/buffered-io.c | 57 +++++++++++++++++++++++++++++++++++-------
>  1 file changed, 48 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 3e846f43ff48..a6e897e6e303 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1393,16 +1393,47 @@ iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
>  }
>  EXPORT_SYMBOL_GPL(iomap_file_unshare);
>  
> -static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
> +/*
> + * Flush the remaining range of the iter and mark the current mapping stale.
> + * This is used when zero range sees an unwritten mapping that may have had
> + * dirty pagecache over it.
> + */
> +static inline int iomap_zero_iter_flush_and_stale(struct iomap_iter *i)
> +{
> +	struct address_space *mapping = i->inode->i_mapping;
> +	loff_t end = i->pos + i->len - 1;
> +
> +	i->iomap.flags |= IOMAP_F_STALE;
> +	return filemap_write_and_wait_range(mapping, i->pos, end);
> +}
> +
> +static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero,
> +		bool *range_dirty)
>  {
>  	const struct iomap *srcmap = iomap_iter_srcmap(iter);
>  	loff_t pos = iter->pos;
>  	loff_t length = iomap_length(iter);
>  	loff_t written = 0;
>  
> -	/* already zeroed?  we're done. */
> -	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
> +	/*
> +	 * We can skip pre-zeroed mappings so long as either the mapping was
> +	 * clean before we started or we've flushed at least once since.
> +	 * Otherwise we don't know whether the current mapping had dirty
> +	 * pagecache, so flush it now, stale the current mapping, and proceed
> +	 * from there.
> +	 *
> +	 * The hole case is intentionally included because this is (ab)used to
> +	 * handle partial folio zeroing in some cases. Hole backed post-eof
> +	 * ranges can be dirtied via mapped write and the flush triggers
> +	 * writeback time post-eof zeroing.
> +	 */
> +	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN) {
> +		if (*range_dirty) {
> +			*range_dirty = false;
> +			return iomap_zero_iter_flush_and_stale(iter);
> +		}
>  		return length;
> +	}
>  
>  	do {
>  		struct folio *folio;
> @@ -1450,19 +1481,27 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
>  		.flags		= IOMAP_ZERO,
>  	};
>  	int ret;
> +	bool range_dirty;
>  
>  	/*
>  	 * Zero range wants to skip pre-zeroed (i.e. unwritten) mappings, but
>  	 * pagecache must be flushed to ensure stale data from previous
> -	 * buffered writes is not exposed.
> +	 * buffered writes is not exposed. A flush is only required for certain
> +	 * types of mappings, but checking pagecache after mapping lookup is
> +	 * racy with writeback and reclaim.
> +	 *
> +	 * Therefore, check the entire range first and pass along whether any
> +	 * part of it is dirty. If so and an underlying mapping warrants it,
> +	 * flush the cache at that point. This trades off the occasional false
> +	 * positive (and spurious flush, if the dirty data and mapping don't
> +	 * happen to overlap) for simplicity in handling a relatively uncommon
> +	 * situation.
>  	 */
> -	ret = filemap_write_and_wait_range(inode->i_mapping,
> -			pos, pos + len - 1);
> -	if (ret)
> -		return ret;
> +	range_dirty = filemap_range_needs_writeback(inode->i_mapping,
> +					pos, pos + len - 1);
>  
>  	while ((ret = iomap_iter(&iter, ops)) > 0)
> -		iter.processed = iomap_zero_iter(&iter, did_zero);
> +		iter.processed = iomap_zero_iter(&iter, did_zero, &range_dirty);

Style nit: Could we do this flush-and-stale from the loop body instead
of passing pointers around?  e.g.

static inline bool iomap_zero_need_flush(const struct iomap_iter *i)
{
	const struct iomap *srcmap = iomap_iter_srcmap(iter);

	return srcmap->type == IOMAP_HOLE ||
	       srcmap->type == IOMAP_UNWRITTEN;
}

static inline int iomap_zero_iter_flush(struct iomap_iter *i)
{
	struct address_space *mapping = i->inode->i_mapping;
	loff_t end = i->pos + i->len - 1;

	i->iomap.flags |= IOMAP_F_STALE;
	return filemap_write_and_wait_range(mapping, i->pos, end);
}

and then:

	range_dirty = filemap_range_needs_writeback(...);

	while ((ret = iomap_iter(&iter, ops)) > 0) {
		if (range_dirty && iomap_zero_need_flush(&iter)) {
			/*
			 * Zero range wants to skip pre-zeroed (i.e.
			 * unwritten) mappings, but...
			 */
			range_dirty = false;
			iter.processed = iomap_zero_iter_flush(&iter);
		} else {
			iter.processed = iomap_zero_iter(&iter, did_zero);
		}
	}

The logic looks correct and sensible. :)

--D

>  	return ret;
>  }
>  EXPORT_SYMBOL_GPL(iomap_zero_range);
> -- 
> 2.45.0
> 
> 




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux