"Darrick J. Wong" <djwong@xxxxxxxxxx> writes: > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > Prior to commit a01b8f225248e, we would always read in the contents of a > !uptodate folio prior to writing userspace data into the folio, > allocated a folio state object, etc. Ritesh introduced an optimization > that skips all of that if the write would cover the entire folio. > > Unfortunately, the optimization misses the unshare case, where we always > have to read in the folio contents since there isn't a data buffer > supplied by userspace. This can result in stale kernel memory exposure > if userspace issues a FALLOC_FL_UNSHARE_RANGE call on part of a shared > file that isn't already cached. > > This was caught by observing fstests regressions in the "unshare around" > mechanism that is used for unaligned writes to a reflinked realtime > volume when the realtime extent size is larger than 1FSB, though I think > it applies to any shared file. > > Cc: ritesh.list@xxxxxxxxx, willy@xxxxxxxxxxxxx > Fixes: a01b8f225248e ("iomap: Allocate ifs in ->write_begin() early") > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > --- > fs/iomap/buffered-io.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) Thanks for catching this case. Fix for this looks good to me. I have verified on my setup. w/o this patch it indeed can cause corruption in the unshare case, since we don't read the disk contents and we might end up writing garbage from the page cache. Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@xxxxxxxxx> > > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > index ae8673ce08b1..0350830fc989 100644 > --- a/fs/iomap/buffered-io.c > +++ b/fs/iomap/buffered-io.c > @@ -640,11 +640,13 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos, > size_t poff, plen; > > /* > - * If the write completely overlaps the current folio, then > + * If the write or zeroing completely overlaps the current folio, then > * entire folio will be dirtied so there is no need for > * per-block state tracking structures to be attached to this folio. > + * For the unshare case, we must read in the ondisk contents because we > + * are not changing pagecache contents. > */ > - if (pos <= folio_pos(folio) && > + if (!(iter->flags & IOMAP_UNSHARE) && pos <= folio_pos(folio) && > pos + len >= folio_pos(folio) + folio_size(folio)) > return 0; >