Re: [PATCH 1/2] iomap: don't skip reading in !uptodate folios when unsharing a range

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Mon, 18 Sep 2023 22:24:34 -0700



On Tue, Sep 19, 2023 at 10:44:58AM +0530, Ritesh Harjani wrote:
> "Darrick J. Wong" <djwong@xxxxxxxxxx> writes:
> 
> > From: Darrick J. Wong <djwong@xxxxxxxxxx>
> >
> > Prior to commit a01b8f225248e, we would always read in the contents of a
> > !uptodate folio prior to writing userspace data into the folio,
> > allocated a folio state object, etc.  Ritesh introduced an optimization
> > that skips all of that if the write would cover the entire folio.
> >
> > Unfortunately, the optimization misses the unshare case, where we always
> > have to read in the folio contents since there isn't a data buffer
> > supplied by userspace.  This can result in stale kernel memory exposure
> > if userspace issues a FALLOC_FL_UNSHARE_RANGE call on part of a shared
> > file that isn't already cached.
> >
> > This was caught by observing fstests regressions in the "unshare around"
> > mechanism that is used for unaligned writes to a reflinked realtime
> > volume when the realtime extent size is larger than 1FSB,
> 
> I was wondering what is testcase that you are referring here to? 
> Can you please tell the testcase no. and the mkfs / mount config options
> which I can use to observe the regression please?

https://lore.kernel.org/linux-fsdevel/169507871947.772278.5767091361086740046.stgit@frogsfrogsfrogs/T/#m8081f74f4f1fcb862399aa1544be082aabe56765

(any xfs config with reflink enabled)

--D

> > though I think it applies to any shared file.
> >
> > Cc: ritesh.list@xxxxxxxxx, willy@xxxxxxxxxxxxx
> > Fixes: a01b8f225248e ("iomap: Allocate ifs in ->write_begin() early")
> > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>
> > ---
> >  fs/iomap/buffered-io.c |    6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >
> >
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index ae8673ce08b1..0350830fc989 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -640,11 +640,13 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
> >  	size_t poff, plen;
> >  
> >  	/*
> > -	 * If the write completely overlaps the current folio, then
> > +	 * If the write or zeroing completely overlaps the current folio, then
> >  	 * entire folio will be dirtied so there is no need for
> >  	 * per-block state tracking structures to be attached to this folio.
> > +	 * For the unshare case, we must read in the ondisk contents because we
> > +	 * are not changing pagecache contents.
> >  	 */
> > -	if (pos <= folio_pos(folio) &&
> > +	if (!(iter->flags & IOMAP_UNSHARE) && pos <= folio_pos(folio) &&
> >  	    pos + len >= folio_pos(folio) + folio_size(folio))
> >  		return 0;
> >