On Wed, Aug 28, 2024 at 02:19:10PM -0400, Brian Foster wrote: > The iomap zero range implementation doesn't properly handle dirty > pagecache over unwritten mappings. It skips such mappings as if they > were pre-zeroed. If some part of an unwritten mapping is dirty in > pagecache from a previous write, the data in cache should be zeroed > as well. Instead, the data is left in cache and creates a stale data > exposure problem if writeback occurs sometime after the zero range. > > Most callers are unaffected by this because the higher level > filesystem contexts that call zero range typically perform a filemap > flush of the target range for other reasons. A couple contexts that > don't otherwise need to flush are write file size extension and > truncate in XFS. The former path is currently susceptible to the > stale data exposure problem and the latter performs a flush > specifically to work around it. > > This is clearly inconsistent and incomplete. As a first step toward > correcting behavior, lift the XFS workaround to iomap_zero_range() > and unconditionally flush the range before the zero range operation > proceeds. While this appears to be a bit of a big hammer, most all > users already do this from calling context save for the couple of > exceptions noted above. Future patches will optimize or elide this > flush while maintaining functional correctness. > > Fixes: ae259a9c8593 ("fs: introduce iomap infrastructure") > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> I wonder why gfs2 (aka the other iomap_zero_range user) doesn't have a truncate-down flush hammer, but maybe it doesn't support unwritten extents? I didn't find anything obvious when I looked, so Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx> (but you might want to see if Andreas has any loud objections to this) --D > --- > fs/iomap/buffered-io.c | 10 ++++++++++ > fs/xfs/xfs_iops.c | 10 ---------- > 2 files changed, 10 insertions(+), 10 deletions(-) > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > index f420c53d86ac..3e846f43ff48 100644 > --- a/fs/iomap/buffered-io.c > +++ b/fs/iomap/buffered-io.c > @@ -1451,6 +1451,16 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, > }; > int ret; > > + /* > + * Zero range wants to skip pre-zeroed (i.e. unwritten) mappings, but > + * pagecache must be flushed to ensure stale data from previous > + * buffered writes is not exposed. > + */ > + ret = filemap_write_and_wait_range(inode->i_mapping, > + pos, pos + len - 1); > + if (ret) > + return ret; > + > while ((ret = iomap_iter(&iter, ops)) > 0) > iter.processed = iomap_zero_iter(&iter, did_zero); > return ret; > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c > index 1cdc8034f54d..ddd3697e6ecd 100644 > --- a/fs/xfs/xfs_iops.c > +++ b/fs/xfs/xfs_iops.c > @@ -870,16 +870,6 @@ xfs_setattr_size( > error = xfs_zero_range(ip, oldsize, newsize - oldsize, > &did_zeroing); > } else { > - /* > - * iomap won't detect a dirty page over an unwritten block (or a > - * cow block over a hole) and subsequently skips zeroing the > - * newly post-EOF portion of the page. Flush the new EOF to > - * convert the block before the pagecache truncate. > - */ > - error = filemap_write_and_wait_range(inode->i_mapping, newsize, > - newsize); > - if (error) > - return error; > error = xfs_truncate_page(ip, newsize, &did_zeroing); > } > > -- > 2.45.0 > >