Re: [PATCHSET 0/3] fstests: direct specification of looping test duration

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Thu, 13 Apr 2023 07:47:08 -0700

On Thu, Apr 13, 2023 at 12:48:36PM +0200, Andrey Albershteyn wrote:
> On Tue, Apr 11, 2023 at 11:13:46AM -0700, Darrick J. Wong wrote:
> > Hi all,
> > 
> > One of the things that I do as a maintainer is to designate a handful of
> > VMs to run fstests for unusually long periods of time.  This practice I
> > call long term soak testing.  There are actually three separate fleets
> > for this -- one runs alongside the nightly builds, one runs alongside
> > weekly rebases, and the last one runs stable releases.
> > 
> > My interactions with all three fleets is pretty much the same -- load
> > current builds of software, and try to run the exerciser tests for a
> > duration of time -- 12 hours, 6.5 days, 30 days, etc.  TIME_FACTOR does
> > not work well for this usage model, because it is difficult to guess
> > the correct time factor given that the VMs are hetergeneous and the IO
> > completion rate is not perfectly predictable.
> > 
> > Worse yet, if you want to run (say) all the recoveryloop tests on one VM
> > (because recoveryloop is prone to crashing), it's impossible to set a
> > TIME_FACTOR so that each loop test gets equal runtime.  That can be
> > hacked around with config sections, but that doesn't solve the first
> > problem.
> > 
> > This series introduces a new configuration variable, SOAK_DURATION, that
> > allows test runners to control directly various long soak and looping
> > recovery tests.  This is intended to be an alternative to TIME_FACTOR,
> > since that variable usually adjusts operation counts, which are
> > proportional to runtime but otherwise not a direct measure of time.
> > 
> > With this override in place, I can configure the long soak fleet to run
> > for exactly as long as I want them to, and they actually hit the time
> > budget targets.  The recoveryloop fleet now divides looping-test time
> > equally among the four that are in that group so that they all get ~3
> > hours of coverage every night.
> > 
> > There are more tests that could use this than I actually modified here,
> > but I've done enough to show this off as a proof of concept.
> > 
> > If you're going to start using this mess, you probably ought to just
> > pull from my git trees, which are linked below.
> > 
> > This is an extraordinary way to destroy everything.  Enjoy!
> > Comments and questions are, as always, welcome.
> > 
> > --D
> > 
> > fstests git tree:
> > https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=soak-duration
> > ---
> >  check                 |   14 +++++++++
> >  common/config         |    7 ++++
> >  common/fuzzy          |    7 ++++
> >  common/rc             |   34 +++++++++++++++++++++
> >  common/report         |    1 +
> >  ltp/fsstress.c        |   78 +++++++++++++++++++++++++++++++++++++++++++++++--
> >  ltp/fsx.c             |   50 +++++++++++++++++++++++++++++++
> >  src/soak_duration.awk |   23 ++++++++++++++
> >  tests/generic/019     |    1 +
> >  tests/generic/388     |    2 +
> >  tests/generic/475     |    2 +
> >  tests/generic/476     |    7 +++-
> >  tests/generic/482     |    5 +++
> >  tests/generic/521     |    1 +
> >  tests/generic/522     |    1 +
> >  tests/generic/642     |    1 +
> >  tests/generic/648     |    8 +++--
> >  17 files changed, 229 insertions(+), 13 deletions(-)
> >  create mode 100644 src/soak_duration.awk
> > 
> 
> The set looks good to me (the second commit has different var name,
> but fine by me)

Which variable name, specifically?

--D

> Reviewed-by: Andrey Albershteyn <aalbersh@xxxxxxxxxx>
> 
> -- 
> - Andrey
>