Re: [PATCH 1/2] iomap: Do not unshare exents beyond EOF

Julian Sun <sunjunchao2870@xxxxxxxxx> · Mon, 9 Sep 2024 20:15:43 +0800

Hi Brian,

Brian Foster <bfoster@xxxxxxxxxx> 于2024年9月7日周六 03:11写道：
>
> On Thu, Sep 05, 2024 at 06:24:24PM +0800, Julian Sun wrote:
> > Attempting to unshare extents beyond EOF will trigger
> > the need zeroing case, which in turn triggers a warning.
> > Therefore, let's skip the unshare process if extents are
> > beyond EOF.
> >
> > Reported-and-tested-by: syzbot+296b1c84b9cbf306e5a0@xxxxxxxxxxxxxxxxxxxxxxxxx
> > Closes: https://syzkaller.appspot.com/bug?extid=296b1c84b9cbf306e5a0
> > Fixes: 32a38a499104 ("iomap: use write_begin to read pages to unshare")
> > Inspired-by: Dave Chinner <david@xxxxxxxxxxxxx>
> > Signed-off-by: Julian Sun <sunjunchao2870@xxxxxxxxx>
> > ---
> >  fs/iomap/buffered-io.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index f420c53d86ac..8898d5ec606f 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -1340,6 +1340,9 @@ static loff_t iomap_unshare_iter(struct iomap_iter *iter)
> >       /* don't bother with holes or unwritten extents */
> >       if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
> >               return length;
> > +     /* don't try to unshare any extents beyond EOF. */
> > +     if (pos > i_size_read(iter->inode))
> > +             return length;
>
> Hi Julian,
>
>
> > What about if pos starts within EOF and the operation extends beyond it?

Extents within EOF will be unshared as usual. Details are below.
>
> > I ask because I think I've reproduced this scenario, though it is a bit
> > tricky and has dependencies...
> >
> > For one, it seems to depend on the cowblocks patch I recently posted
> > here [1] (though I don't think this is necessarily a problem with the
> > patch, it just depends on keeping COW fork blocks around after the
> > unshare). With that, I reproduce via fsx with unshare range support [2]
> > using the ops file appended below [3] on a -bsize=1k XFS fs.
> >
> > I haven't quite characterized the full sequence other than it looks like
> > the unshare walks across EOF with a shared data fork block and COW fork
> > delalloc, presumably finds the post-eof part of the folio !uptodate (so
> > iomap_adjust_read_range() doesn't skip it), and then trips over the
> > warning and error return associated with the folio zeroing in
> > __iomap_write_begin().

The scenario has already been reproduced by syzbot[1]. The reproducer
provided by syzbot constructed the following extent maps for a file of
size 0xE00 before fallocate unshare:

0 - 4k: shared between two files
4k - 6k: hole beyond EOF, not shared
6k - 8k: delalloc extends

Then the reproducer attempted to unshare the extent between 0 and
0x2000 bytes, but the file size is 0xE00. This is likely the scenario
you were referring to?

Eventually the unshare code does:
first map: 0 - 4k - unshare successfully.
second map: 4k - 6k - hole, skip. Beyond EOF.
third map: 6k - 8k - delalloc, beyond EOF so needs zeroing.
Fires warnings because UNSHARE.

During the first call to iomap_unshare_iter(), iomap_length() returned
4k, so 4k bytes were unshared.
See discuss here[2] for more details.
>
> This all kind of has me wondering.. do we know the purpose of this
> warning/error in the first place? It seems like it's more of an
> "unexpected situation" than a specific problem. E.g., assuming the same
> page were mmap'd, I _think_ the read fault path would do the same sort
> of zeroing such that the unshare would see a fully uptodate folio and
> carry on as normal. I added the mapread op to the opsfile below to give
> that a quick test (remove the "skip" text to enable it), and it seems to
> prevent the error, but I've not confirmed whether that theory is
> actually what's happening.
>
>
> > FWIW, I also wonder if another way to handle this would be to just
> > restrict the range of iomap_file_unshare() to within EOF. IOW if a
> > caller passes a range beyond EOF, just process whatever part of the
> > range falls within EOF. It seems iomap isn't responsible for the file
> > extending aspect of the fallocate unshare command anyways.

It already does 'just process whatever part of the range falls within EOF'.
Check the above case.

I'm not sure if I fully understand what you mean. This patch does not
prevent unsharing extents within the EOF. This patch checks if pos is
beyond EOF, instead of checking if pos + length is beyond EOF. So the
extents within EOF should be unshared as usual.

BTW, maybe the check here should be
                  if (pos >= i_size_read(iter->inode))

If there is any misunderstanding, please let me know, thanks.

[1]: https://lore.kernel.org/all/0000000000008964f1061f8c32b6@xxxxxxxxxx/T/
[2]: https://lore.kernel.org/all/20240903054808.126799-1-sunjunchao2870@xxxxxxxxx/

>
> Thoughts?
>
> Brian
>
> [1] https://lore.kernel.org/linux-xfs/20240906114051.120743-1-bfoster@xxxxxxxxxx/
> [2] https://lore.kernel.org/fstests/20240906185606.136402-1-bfoster@xxxxxxxxxx/
> [3] fsx ops file:
>
> fallocate 0x3bc00 0x400 0
> write 0x3bc00 0x800 0x3c000
> clone_range 0x3bc00 0x400 0x0 0x3c400
> skip mapread 0x3c000 0x400 0x3c400
> fallocate 0x3bc00 0xc00 0x3c400 unshare
>

Thanks,
-- 
Julian Sun <sunjunchao2870@xxxxxxxxx>