On Tue, May 31, 2022 at 07:58:48AM +1000, Dave Chinner wrote: > On Mon, May 30, 2022 at 07:17:42PM +0300, Amir Goldstein wrote: > > On Mon, May 30, 2022 at 4:29 PM Zorro Lang <zlang@xxxxxxxxxx> wrote: > > > > > > On Mon, May 30, 2022 at 02:29:05PM +0300, Amir Goldstein wrote: > > > > For this test to run on overlayfs we open a different file to perform > > > > shutdown+fsync while keeping the writeback target file open. > > > > > > > > We should probably perform fsync on the writeback target file, but > > > > the bug is reproduced on xfs and overlayfs+xfs also as is. > > > > > > > > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx> > > > > --- > > > > > > > > Zorro, > > > > > > > > I tested that this test passes for both xfs and overlayfs+xfs on v5.18 > > > > and tested that both configs fail with the same warning on v5.10.109. > > > > > > > > I tried changing fsync to syncfs for the test to be more correct in the > > > > overlayfs case, but then golden output of xfs and overlayfs+xfs differ > > > > and that would need some more output filtering (or disregarding output > > > > completely). > > > > > > > > Since this minimal change does the job and does not change test behavior > > > > on xfs on any of the tested kernels, I thought it might be good enough. > > > > > > > > Thanks, > > > > Amir. > > > > > > > > tests/generic/623 | 5 ++++- > > > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/tests/generic/623 b/tests/generic/623 > > > > index ea016d91..bb36ad25 100755 > > > > --- a/tests/generic/623 > > > > +++ b/tests/generic/623 > > > > @@ -24,10 +24,13 @@ _scratch_mount > > > > # XFS had a regression where it failed to check shutdown status in the fault > > > > # path. This produced an iomap warning because writeback failure clears Uptodate > > > > # status on the page. > > > > +# For this test to run on overlayfs we open a different file to perform > > > > +# shutdown+fsync while keeping the writeback target file open. > > To trigger the original bug, the post-shutdown fsync needs to be run > on the original file. That triggers a writeback error writeback > which clears the uptodate state on the mapped page. The mwrite that > follows then trips over the state of the page and attempts IO > operations without first checking shutdown state. > > Hence moving the fsync to a different file will break the mechanism > the regression test uses to trigger the original bug. > > > > > file=$SCRATCH_MNT/file > > > > $XFS_IO_PROG -fc "pwrite 0 4k" -c fsync $file | _filter_xfs_io > > > > ulimit -c 0 > > > > -$XFS_IO_PROG -x -c "mmap 0 4k" -c "mwrite 0 4k" -c shutdown -c fsync \ > > > > +$XFS_IO_PROG -x -c "mmap 0 4k" -c "mwrite 0 4k" \ > > > > + -c "open $(_scratch_shutdown_handle)" -c shutdown -c fsync \ > > > > > > Did you try to reproduce the original bug which this test case covers? > > > > > > > Yes. As I wrote: > > "tested that both configs fail with the same warning on v5.10.109" > > Meaning the same bug that the test triggered before my change > > in v5.10 is still triggered on xfs in v5.10 and it is triggered on both > > xfs and overlayfs+xfs in v5.10 with my change. > > It reproduced on 5.10, but not because of the reasons you are > suggesting. > > > > > > According to the "man xfs_io": > > > > > > open [[ -acdfrstRTPL ] path ] > > > Closes the current file, and opens the file specified by path instead. > > > > The documentation is incorrect. > > Current file is not closed. > > It is not closed, but it's also not the target of subsequent file > operations until you use "file 0" to switch back to it... I checked the xfsprogs/io/open.c, it truely doesn't "Close the current file", (need to update the man-page?) and it looks like make all opened files as a list, then operate them one by one(?): execve("/usr/sbin/xfs_io", ["xfs_io", "-c", "mmap 0 4k", "-c", "mwrite 0 4k", "-c", "open testfile", "-c", "fsync", "-c", "mwrite 0 4k", "testfile"], 0x7ffdc2285b78 /* 42 vars */) = 0 ... openat(AT_FDCWD, "testfile", O_RDWR) = 3 ... mmap(NULL, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_SHARED, 3, 0) = 0x7efc04548000 ... openat(AT_FDCWD, "testfile", O_RDWR) = 4 ... fsync(3) = 0 fsync(4) = 0 exit_group(0) = ? +++ exited with 0 +++ Anyway, looks like we're stuck at here. So I have 3 crude methods which is just out of my mind, hope to get review:) 1) Skip this test for overlay directly, as this patch does. 2) Add a _require_real_scratch_shutdown() helper, require the $FSTYP supporting GOINGDOWN ioctl, don't fallback to the underlying fs. Then let generic/623 use it to skip overlay and other fs which really doesn't support shutdown. 3) Change xfsprogs/io/shutdown.c, let shutdown_f() accept an extra mountpoint path argument, to shutdown a specified mountpoint (?? not sure if it's reasonable for xfs_io :). Then let generic/623 require and use this feature. Thanks, Zorro > > > > Although I doubt if it always real close the current file, but you open to get > > > a new file descriptor, later operations will base on new fd. I don't know if > > > it still has original testing coverage. > > > > fsync on the fs root dir is not the same as fsync on the original file. > > Yup, and that's what will break the regression test. > > But why does it still fail on v5.10.109? > > Well, that's because of a quirk of the xfs_io fsync command. It > doesn't have the CMD_FLAG_ONESHOT flag set on it, so it operates on > *all open files*, not just the current file. > > IOWs, the misunderstanding of how the bug is triggered has been > covered up by the misunderstanding of how the xfs_io open file table > and the fsync command interact. > > > mwrite does not change because mwrite is not acted on open fd > > it is acted on memory mapping of mmap. > > > > I can either change fd again to first fd before doing fsync > > or change fsync to syncfs. > > Do not change it to syncfs - that executes completely different > writeback and metadata sync code paths with different error > propagation and may well result in very different behaviour from the > underlying filesystem. Fundamentally, syncfs() and fsync() are not > interchangeable from a regression test POV - you *might* get the > same high level result, but the low level code behaves very > differently... > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx >