Re: [PATCH] generic/623: fix test for runing on overlayfs

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]



On Tue, May 31, 2022 at 07:58:48AM +1000, Dave Chinner wrote:
> On Mon, May 30, 2022 at 07:17:42PM +0300, Amir Goldstein wrote:
> > On Mon, May 30, 2022 at 4:29 PM Zorro Lang <zlang@xxxxxxxxxx> wrote:
> > >
> > > On Mon, May 30, 2022 at 02:29:05PM +0300, Amir Goldstein wrote:
> > > > For this test to run on overlayfs we open a different file to perform
> > > > shutdown+fsync while keeping the writeback target file open.
> > > >
> > > > We should probably perform fsync on the writeback target file, but
> > > > the bug is reproduced on xfs and overlayfs+xfs also as is.
> > > >
> > > > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx>
> > > > ---
> > > >
> > > > Zorro,
> > > >
> > > > I tested that this test passes for both xfs and overlayfs+xfs on v5.18
> > > > and tested that both configs fail with the same warning on v5.10.109.
> > > >
> > > > I tried changing fsync to syncfs for the test to be more correct in the
> > > > overlayfs case, but then golden output of xfs and overlayfs+xfs differ
> > > > and that would need some more output filtering (or disregarding output
> > > > completely).
> > > >
> > > > Since this minimal change does the job and does not change test behavior
> > > > on xfs on any of the tested kernels, I thought it might be good enough.
> > > >
> > > > Thanks,
> > > > Amir.
> > > >
> > > >  tests/generic/623 | 5 ++++-
> > > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/tests/generic/623 b/tests/generic/623
> > > > index ea016d91..bb36ad25 100755
> > > > --- a/tests/generic/623
> > > > +++ b/tests/generic/623
> > > > @@ -24,10 +24,13 @@ _scratch_mount
> > > >  # XFS had a regression where it failed to check shutdown status in the fault
> > > >  # path. This produced an iomap warning because writeback failure clears Uptodate
> > > >  # status on the page.
> > > > +# For this test to run on overlayfs we open a different file to perform
> > > > +# shutdown+fsync while keeping the writeback target file open.
> 
> To trigger the original bug, the post-shutdown fsync needs to be run
> on the original file. That triggers a writeback error writeback
> which clears the uptodate state on the mapped page. The mwrite that
> follows then trips over the state of the page and attempts IO
> operations without first checking shutdown state.
> 
> Hence moving the fsync to a different file will break the mechanism
> the regression test uses to trigger the original bug.
> 
> > > >  file=$SCRATCH_MNT/file
> > > >  $XFS_IO_PROG -fc "pwrite 0 4k" -c fsync $file | _filter_xfs_io
> > > >  ulimit -c 0
> > > > -$XFS_IO_PROG -x -c "mmap 0 4k" -c "mwrite 0 4k" -c shutdown -c fsync \
> > > > +$XFS_IO_PROG -x -c "mmap 0 4k" -c "mwrite 0 4k" \
> > > > +     -c "open $(_scratch_shutdown_handle)" -c shutdown -c fsync \
> > >
> > > Did you try to reproduce the original bug which this test case covers?
> > >
> > 
> > Yes. As I wrote:
> > "tested that both configs fail with the same warning on v5.10.109"
> > Meaning the same bug that the test triggered before my change
> > in v5.10 is still triggered on xfs in v5.10 and it is triggered on both
> > xfs and overlayfs+xfs in v5.10 with my change.
> 
> It reproduced on 5.10, but not because of the reasons you are
> suggesting.
> 
> > 
> > > According to the "man xfs_io":
> > >
> > >        open [[ -acdfrstRTPL ] path ]
> > >               Closes the current file, and opens the file specified by path instead.
> > 
> > The documentation is incorrect.
> > Current file is not closed.
> 
> It is not closed, but it's also not the target of subsequent file
> operations until you use "file 0" to switch back to it...

I checked the xfsprogs/io/open.c, it truely doesn't "Close the current file",
(need to update the man-page?) and it looks like make all opened files as a
list, then operate them one by one(?):

  execve("/usr/sbin/xfs_io", ["xfs_io", "-c", "mmap 0 4k", "-c", "mwrite 0 4k", "-c", "open testfile", "-c", "fsync", "-c", "mwrite 0 4k", "testfile"], 0x7ffdc2285b78 /* 42 vars */) = 0
  ...
  openat(AT_FDCWD, "testfile", O_RDWR)    = 3
  ...
  mmap(NULL, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_SHARED, 3, 0) = 0x7efc04548000
  ...
  openat(AT_FDCWD, "testfile", O_RDWR)    = 4
  ...
  fsync(3)                                = 0
  fsync(4)                                = 0
  exit_group(0)                           = ?
  +++ exited with 0 +++

Anyway, looks like we're stuck at here. So I have 3 crude methods which is just
out of my mind, hope to get review:)

1) Skip this test for overlay directly, as this patch does.
2) Add a _require_real_scratch_shutdown() helper, require the $FSTYP supporting
   GOINGDOWN ioctl, don't fallback to the underlying fs. Then let generic/623
   use it to skip overlay and other fs which really doesn't support shutdown.
3) Change xfsprogs/io/shutdown.c, let shutdown_f() accept an extra mountpoint
   path argument, to shutdown a specified mountpoint (?? not sure if it's
   reasonable for xfs_io :). Then let generic/623 require and use this feature.

Thanks,
Zorro

> 
> > > Although I doubt if it always real close the current file, but you open to get
> > > a new file descriptor, later operations will base on new fd. I don't know if
> > > it still has original testing coverage.
> > 
> > fsync on the fs root dir is not the same as fsync on the original file.
> 
> Yup, and that's what will break the regression test.
> 
> But why does it still fail on v5.10.109?
> 
> Well, that's because of a quirk of the xfs_io fsync command.  It
> doesn't have the CMD_FLAG_ONESHOT flag set on it, so it operates on
> *all open files*, not just the current file.
> 
> IOWs, the misunderstanding of how the bug is triggered has been
> covered up by the misunderstanding of how the xfs_io open file table
> and the fsync command interact.
> 
> > mwrite does not change because mwrite is not acted on open fd
> > it is acted on memory mapping of mmap.
> > 
> > I can either change fd again to first fd before doing fsync
> > or change fsync to syncfs.
> 
> Do not change it to syncfs - that executes completely different
> writeback and metadata sync code paths with different error
> propagation and may well result in very different behaviour from the
> underlying filesystem.  Fundamentally, syncfs() and fsync() are not
> interchangeable from a regression test POV - you *might* get the
> same high level result, but the low level code behaves very
> differently...
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx
> 




[Index of Archives]     [Linux Filesystems Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux