Re: [PATCH 1/2] iomap: don't skip reading in !uptodate folios when unsharing a range

Ritesh Harjani (IBM) <ritesh.list@xxxxxxxxx> · Tue, 19 Sep 2023 10:12:19 +0530

"Darrick J. Wong" <djwong@xxxxxxxxxx> writes:

> From: Darrick J. Wong <djwong@xxxxxxxxxx>
>
> Prior to commit a01b8f225248e, we would always read in the contents of a
> !uptodate folio prior to writing userspace data into the folio,
> allocated a folio state object, etc.  Ritesh introduced an optimization
> that skips all of that if the write would cover the entire folio.
>
> Unfortunately, the optimization misses the unshare case, where we always
> have to read in the folio contents since there isn't a data buffer
> supplied by userspace.  This can result in stale kernel memory exposure
> if userspace issues a FALLOC_FL_UNSHARE_RANGE call on part of a shared
> file that isn't already cached.
>
> This was caught by observing fstests regressions in the "unshare around"
> mechanism that is used for unaligned writes to a reflinked realtime
> volume when the realtime extent size is larger than 1FSB, though I think
> it applies to any shared file.
>
> Cc: ritesh.list@xxxxxxxxx, willy@xxxxxxxxxxxxx
> Fixes: a01b8f225248e ("iomap: Allocate ifs in ->write_begin() early")
> Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>
> ---
>  fs/iomap/buffered-io.c |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)

Thanks for catching this case. Fix for this looks good to me. 
I have verified on my setup. w/o this patch it indeed can cause
corruption in the unshare case, since we don't read the disk contents
and we might end up writing garbage from the page cache.

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@xxxxxxxxx>

>
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index ae8673ce08b1..0350830fc989 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -640,11 +640,13 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
>  	size_t poff, plen;
>  
>  	/*
> -	 * If the write completely overlaps the current folio, then
> +	 * If the write or zeroing completely overlaps the current folio, then
>  	 * entire folio will be dirtied so there is no need for
>  	 * per-block state tracking structures to be attached to this folio.
> +	 * For the unshare case, we must read in the ondisk contents because we
> +	 * are not changing pagecache contents.
>  	 */
> -	if (pos <= folio_pos(folio) &&
> +	if (!(iter->flags & IOMAP_UNSHARE) && pos <= folio_pos(folio) &&
>  	    pos + len >= folio_pos(folio) + folio_size(folio))
>  		return 0;
>