[PATCH RFC 0/4] iomap: zero dirty folios over unwritten mappings on zero range

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

This is a stab at fixing the iomap zero range problem where it doesn't
correctly handle the case of an unwritten mapping with dirty pagecache.
The gist is that we scan the mapping for dirty cache, zero any
already-dirty folios via buffered writes as normal, but then otherwise
skip clean ranges once we have a chance to validate those ranges against
races with writeback or reclaim.

This is somewhat simplistic in terms of how it scans, but that is
intentional based on the existing use cases for zero range. From poking
around a bit, my current sense is that there isn't any user of zero
range that would ever expect to see more than a single dirty folio. Most
callers either straddle the EOF folio or flush in higher level code for
presumably (fs) context specific reasons. If somebody has an example to
the contrary, please let me know because I'd love to be able to use it
for testing.

The caveat to this approach is that it only works for filesystems that
implement folio_ops->iomap_valid(), which is currently just XFS. GFS2
doesn't use ->iomap_valid() and does call zero range, but AFAICT it
doesn't actually export unwritten mappings so I suspect this is not a
problem. My understanding is that ext4 iomap support is in progress, but
I've not yet dug into what that looks like (though I suspect similar to
XFS). The concern is mainly that this leaves a landmine for fs that
might grow support for unwritten mappings && zero range but not
->iomap_valid(). We'd likely never know zero range was broken for such
fs until stale data exposure problems start to materialize.

I considered adding a fallback to just add a flush at the top of
iomap_zero_range() so at least all future users would be correct, but I
wanted to gate that on the absence of ->iomap_valid() and folio_ops
isn't provided until iomap_begin() time. I suppose another way around
that could be to add a flags param to iomap_zero_range() where the
caller could explicitly opt out of a flush, but that's still kind of
ugly. I dunno, maybe better than nothing..?

So IMO, this raises the question of whether this is just unnecessarily
overcomplicated. The KISS principle implies that it would also be
perfectly fine to do a conditional "flush and stale" in zero range
whenever we see the combination of an unwritten mapping and dirty
pagecache (the latter checked before or during ->iomap_begin()). That's
simple to implement and AFAICT would work/perform adequately and
generically for all filesystems. I have one or two prototypes of this
sort of thing if folks want to see it as an alternative.

Otherwise, this series has so far seen some light regression and
targeted testing. Patches 1-2 are factoring dependencies, patch 3
implements the main fix, and patch 4 drops the flush from XFS truncate.
Thoughts, reviews, flames appreciated.

Brian

Brian Foster (4):
  filemap: return pos of first dirty folio from range_has_writeback
  iomap: refactor an iomap_revalidate() helper
  iomap: fix handling of dirty folios over unwritten extents
  xfs: remove unnecessary flush of eof page from truncate

 fs/iomap/buffered-io.c  | 81 ++++++++++++++++++++++++++++++++++-------
 fs/xfs/xfs_iops.c       | 10 -----
 include/linux/pagemap.h |  4 +-
 mm/filemap.c            |  8 ++--
 4 files changed, 75 insertions(+), 28 deletions(-)

-- 
2.45.0





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux