Re: [PATCH 2/4] fstests: make sure to unfreeze test and scratch mounts

Amir Goldstein <amir73il@xxxxxxxxx> · Tue, 21 Jun 2022 05:53:49 +0300

On Tue, Jun 21, 2022 at 1:34 AM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>
> On Mon, Jun 20, 2022 at 10:21:39PM +0800, Zorro Lang wrote:
> > On Mon, Jun 20, 2022 at 03:13:33PM +0300, Amir Goldstein wrote:
> > > On Mon, Jun 20, 2022 at 2:17 PM Zorro Lang <zlang@xxxxxxxxxx> wrote:
> > > >
> > > > On Sun, Jun 19, 2022 at 04:46:55PM +0300, Amir Goldstein wrote:
> > > > > Almost all of the tests that _require_freeze() fail to unfreeze
> > > > > scratch mount in case the test is interrupted while fs is frozen.
> > > > >
> > > > > Move the handling of unfreeze to generic check code.
> > > > > For now, tests only freeze scratch fs, but to be more robust, unfreeze
> > > > > both test and scratch fs following a call to _require_freeze().
> > > > >
> > > > > Tests could still hang if thier private _cleanup() routine tries
> > > > > to modify the frozen fs or wait for a blocked process. Fix the
> > > > > _cleanup() routine of xfs/011 to avoid that.
> > > > >
> > > > > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx>
> > > > > ---
> > > > >  check             | 14 ++++++++------
> > > > >  common/rc         |  5 +++--
> > > > >  tests/generic/390 |  2 --
> > > > >  tests/xfs/011     |  2 --
> > > > >  tests/xfs/517     |  1 -
> > > > >  5 files changed, 11 insertions(+), 13 deletions(-)
> > > > >
> > > > > diff --git a/check b/check
> > > > > index de11b37e..d6ee71aa 100755
> > > > > --- a/check
> > > > > +++ b/check
> > > > > @@ -527,17 +527,21 @@ _check_filesystems()
> > > > >  {
> > > > >       local ret=0
> > > > >
> > > > > +     # Make sure both test and scratch are unfrozen post _require_freeze()
> > > > > +     if [ -f ${RESULT_DIR}/require_freeze ]; then
> > > > > +             xfs_freeze -u "$TEST_DIR" >/dev/null 2>&1
> > > > > +             xfs_freeze -u "$SCRATCH_MNT" >/dev/null 2>&1
> > > > > +     fi
> > > >
> > > > Hi Amir,
> > > >
> > > > I'm wondering why not let each freeze related test cases take care of "unfreeze"
> > > > by itself? If a case freeze a filesystem which isn't on TEST_DEV or SCRATCH_DEV
> > > > (e.g. a loop device base on a file in $TEST_DIR), or other situations, then the
> > > > case still can call _require_freeze, but "unfreeze TEST_DIR or SCRATCH_MNT" can't
> > > > help.
> > >
> > > That's also an option, but:
> > > 1. It is less robust, leaving room for human mistakes. Why is that better?
>
> It's not, but I'm addressing exactly this problem with the common
> _cleanup() infrastructure that I'm working on. Several of the issues
> that you are trying to address here are already dealt with in that
> series, and it doesn't add extra overhead to the test
> infrastructure for tests that don't use freeze/thaw.
>
> > > 2. Leftover frozen fs is quite harmful to subsequent test runs, so it
> > > is important
> > >     to avoid it
>
> Well, yes. It is a test bug, and the test shouldn't have bugs.
>
> > > 3. Tests that freeze loop/dm (generic/459, xfs/428, [*]) can and should cleanup
> > >     themselves. I will add that too, but in any case...
>
> Except now we have two methods of thawing the filesystem depending
> on where it is mounted and what device is in use. That's not an
> improvement.
>
> > > 4. unfreeze of tests and scratch fs is harmless even when it is not needed -
> > >     we may even want to do that at the start of tests run(?)
> >
> > I think Dave doesn't like to add common steps to thousands of cases, if without
> > a critical reason. So I hope to get more review points for this kind of changes.
>
> Doing work that in every test execution that is only actually
> necessary if a test is interrupted to work around a potential bug in
> 1 or 2 tests is really poor design and implementation. Just fix the
> test bugs, don't burden everything else with the additional overhead
> of checking whether the test might have a bug in it or not...
>
> > > [*] I noticed that generic/085 which _require_freeze() does not even
> > >     use xfs_freeze (intentionally) - it uses dmsetup suspend/resume,
> > >     so I guess _require_freeze() should be removed from that test.
> >
> > Agree, I think dm suspend through different userspace and kernel interface with
> > fsfreeze, so that require doesn't make sense.
>
> dmsuspend uses the internal kernel freeze API to freeze the
> filesytsems. Failures in those tests can leave filesytsems frozen,
> too, but these days we cannot thaw them with fs level freeze....
>
> > > Anyway, if you and others insist against this auto-unfreeze approach,
> > > I will post the patch to unfreeze fs in individual tests, but I won't be
> > > happy about it.
> >
> > From the functional side, I think this change makes sense. But if think about
> > the performance, let's get more opinions at first. If there's not objection,
> > we can have it.
>
> I think it's wrong from a functional side, too. The test harness is
> not responsible for individual test state cleanup - the tests
> themselves are responsible for that. The whole point of the
> _cleanup() consolidation I've started doing is that it will
> centralise all this common cleanup functionality and allow us to end
> up connecting it to _requires rules that indicate what functionality
> the test uses.  i.e. the end goal is that most tests do not need any
> cleanup - everything that needs cleaning up is defined by the
> _requires rules the test runs first and nothing else is needed.
>
> Only when tests do custom stuff will test specific cleanup functions
> be needed, just like they are required to do now to clean up after
> themselves. And that means, right now, that tests that freeze the
> filesystem still need to do the right thing in the _cleanup()
> functions up until the consolidation of cleanup means they don't
> need it anymore....
>
> AFAIC, you can go move all this stuff to check if you really
> want, but I'm just going to rip it all back out in short order
> because it just doesn't fit the cleanup model I'm trying to move the
> infrastructure towards.
>

As long as your cleanup is going to consolidate all this I will
just go ahead and fix the buggy tests now.

Thanks!
Amir.