On Mon, May 30, 2022 at 07:17:42PM +0300, Amir Goldstein wrote: > On Mon, May 30, 2022 at 4:29 PM Zorro Lang <zlang@xxxxxxxxxx> wrote: > > > > On Mon, May 30, 2022 at 02:29:05PM +0300, Amir Goldstein wrote: > > > For this test to run on overlayfs we open a different file to perform > > > shutdown+fsync while keeping the writeback target file open. > > > > > > We should probably perform fsync on the writeback target file, but > > > the bug is reproduced on xfs and overlayfs+xfs also as is. > > > > > > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx> > > > --- > > > > > > Zorro, > > > > > > I tested that this test passes for both xfs and overlayfs+xfs on v5.18 > > > and tested that both configs fail with the same warning on v5.10.109. > > > > > > I tried changing fsync to syncfs for the test to be more correct in the > > > overlayfs case, but then golden output of xfs and overlayfs+xfs differ > > > and that would need some more output filtering (or disregarding output > > > completely). > > > > > > Since this minimal change does the job and does not change test behavior > > > on xfs on any of the tested kernels, I thought it might be good enough. > > > > > > Thanks, > > > Amir. > > > > > > tests/generic/623 | 5 ++++- > > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > > > diff --git a/tests/generic/623 b/tests/generic/623 > > > index ea016d91..bb36ad25 100755 > > > --- a/tests/generic/623 > > > +++ b/tests/generic/623 > > > @@ -24,10 +24,13 @@ _scratch_mount > > > # XFS had a regression where it failed to check shutdown status in the fault > > > # path. This produced an iomap warning because writeback failure clears Uptodate > > > # status on the page. > > > +# For this test to run on overlayfs we open a different file to perform > > > +# shutdown+fsync while keeping the writeback target file open. To trigger the original bug, the post-shutdown fsync needs to be run on the original file. That triggers a writeback error writeback which clears the uptodate state on the mapped page. The mwrite that follows then trips over the state of the page and attempts IO operations without first checking shutdown state. Hence moving the fsync to a different file will break the mechanism the regression test uses to trigger the original bug. > > > file=$SCRATCH_MNT/file > > > $XFS_IO_PROG -fc "pwrite 0 4k" -c fsync $file | _filter_xfs_io > > > ulimit -c 0 > > > -$XFS_IO_PROG -x -c "mmap 0 4k" -c "mwrite 0 4k" -c shutdown -c fsync \ > > > +$XFS_IO_PROG -x -c "mmap 0 4k" -c "mwrite 0 4k" \ > > > + -c "open $(_scratch_shutdown_handle)" -c shutdown -c fsync \ > > > > Did you try to reproduce the original bug which this test case covers? > > > > Yes. As I wrote: > "tested that both configs fail with the same warning on v5.10.109" > Meaning the same bug that the test triggered before my change > in v5.10 is still triggered on xfs in v5.10 and it is triggered on both > xfs and overlayfs+xfs in v5.10 with my change. It reproduced on 5.10, but not because of the reasons you are suggesting. > > > According to the "man xfs_io": > > > > open [[ -acdfrstRTPL ] path ] > > Closes the current file, and opens the file specified by path instead. > > The documentation is incorrect. > Current file is not closed. It is not closed, but it's also not the target of subsequent file operations until you use "file 0" to switch back to it.... > > Although I doubt if it always real close the current file, but you open to get > > a new file descriptor, later operations will base on new fd. I don't know if > > it still has original testing coverage. > > fsync on the fs root dir is not the same as fsync on the original file. Yup, and that's what will break the regression test. But why does it still fail on v5.10.109? Well, that's because of a quirk of the xfs_io fsync command. It doesn't have the CMD_FLAG_ONESHOT flag set on it, so it operates on *all open files*, not just the current file. IOWs, the misunderstanding of how the bug is triggered has been covered up by the misunderstanding of how the xfs_io open file table and the fsync command interact. > mwrite does not change because mwrite is not acted on open fd > it is acted on memory mapping of mmap. > > I can either change fd again to first fd before doing fsync > or change fsync to syncfs. Do not change it to syncfs - that executes completely different writeback and metadata sync code paths with different error propagation and may well result in very different behaviour from the underlying filesystem. Fundamentally, syncfs() and fsync() are not interchangeable from a regression test POV - you *might* get the same high level result, but the low level code behaves very differently... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx