On Wed, Oct 21, 2020 at 09:33:29AM -0400, Brian Foster wrote: > iomap_truncate_page() relies on zero range and zero range > unconditionally skips unwritten mappings. This is normally not a > problem as most users synchronize in-core state to the underlying > block mapping by flushing pagecache prior to calling into iomap. > This is not the case for iomap_truncate_page(), however. XFS calls > iomap_truncate_page() on truncate down before flushing the new EOF > page of the file. This means that if the new EOF block is unwritten > but covered by a dirty or writeback page (i.e. awaiting unwritten > conversion after writeback), iomap fails to zero that page. The > subsequent truncate_setsize() call does perform page zeroing, but > doesn't dirty the page. Therefore if the new EOF page is written > back after calling into iomap but before the pagecache truncate, the > post-EOF zeroing is lost on page reclaim. This exposes stale > post-EOF data on mapped reads. > > Rework iomap_truncate_page() to check pagecache state before calling > into iomap_apply() and use that info to determine whether we can > safely skip zeroing unwritten extents. The filesystem has locked out > concurrent I/O and mapped operations at this point but is not > serialized against writeback, unwritten extent conversion (I/O > completion) or page reclaim. Therefore if a page does not exist > before we acquire the mapping, we can be certain that an unwritten > extent cannot be converted before we return and thus it is safe to > skip. If a page does exist over an unwritten block, it could be in > the dirty or writeback states, convert the underlying mapping at any > time, and thus should be explicitly written to avoid racing with > writeback. Finally, since iomap_truncate_page() only targets the new > EOF block and must now pass additional state to the actor, open code > the zeroing actor instead of plumbing through zero range. > > This does have the tradeoff that an existing clean page is dirtied > and causes unwritten conversion, but this is analogous to historical > behavior implemented by block_truncate_page(). This patch restores > historical behavior to address the data exposure problem and leaves > filtering out the clean page case for a separate patch. > Fixes: 68a9f5e7007c ("xfs: implement iomap based buffered write path") > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > --- > > v2: > - Rework to check for cached page explicitly and avoid use of seek data. > v1: https://lore.kernel.org/linux-fsdevel/20201012140350.950064-1-bfoster@xxxxxxxxxx/ Has the reproducer listed in that email been turned into a fstest case yet? :) > > fs/iomap/buffered-io.c | 41 ++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 40 insertions(+), 1 deletion(-) > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > index bcfc288dba3f..2cdfcff02307 100644 > --- a/fs/iomap/buffered-io.c > +++ b/fs/iomap/buffered-io.c > @@ -1000,17 +1000,56 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, > } > EXPORT_SYMBOL_GPL(iomap_zero_range); > > +struct iomap_trunc_priv { > + bool *did_zero; > + bool has_page; > +}; > + > +static loff_t > +iomap_truncate_page_actor(struct inode *inode, loff_t pos, loff_t count, > + void *data, struct iomap *iomap, struct iomap *srcmap) > +{ > + struct iomap_trunc_priv *priv = data; > + unsigned offset; > + int status; > + > + if (srcmap->type == IOMAP_HOLE) > + return count; > + if (srcmap->type == IOMAP_UNWRITTEN && !priv->has_page) > + return count; > + > + offset = offset_in_page(pos); > + if (IS_DAX(inode)) > + status = dax_iomap_zero(pos, offset, count, iomap); > + else > + status = iomap_zero(inode, pos, offset, count, iomap, srcmap); > + if (status < 0) > + return status; > + > + if (priv->did_zero) > + *priv->did_zero = true; > + return count; > +} > + > int > iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero, > const struct iomap_ops *ops) > { > + struct iomap_trunc_priv priv = { .did_zero = did_zero }; > unsigned int blocksize = i_blocksize(inode); > unsigned int off = pos & (blocksize - 1); > + loff_t ret; > > /* Block boundary? Nothing to do */ > if (!off) > return 0; > - return iomap_zero_range(inode, pos, blocksize - off, did_zero, ops); > + > + priv.has_page = filemap_range_has_page(inode->i_mapping, pos, pos); Er... shouldn't that second 'pos' be 'pos + blocksize - off - 1', like the apply call below? I guess it doesn't matter since we're only interested in the page at pos, but the double usage of pos caught my eye. I also wonder, can you move this into the actor so that you can pass *did_zero straight through without the two-item struct? --D > + ret = iomap_apply(inode, pos, blocksize - off, IOMAP_ZERO, ops, &priv, > + iomap_truncate_page_actor); > + if (ret <= 0) > + return ret; > + return 0; > } > EXPORT_SYMBOL_GPL(iomap_truncate_page); > > -- > 2.25.4 >