On Tue, Apr 11, 2023 at 11:13:46AM -0700, Darrick J. Wong wrote: > Hi all, > > One of the things that I do as a maintainer is to designate a handful of > VMs to run fstests for unusually long periods of time. This practice I > call long term soak testing. There are actually three separate fleets > for this -- one runs alongside the nightly builds, one runs alongside > weekly rebases, and the last one runs stable releases. > > My interactions with all three fleets is pretty much the same -- load > current builds of software, and try to run the exerciser tests for a > duration of time -- 12 hours, 6.5 days, 30 days, etc. TIME_FACTOR does > not work well for this usage model, because it is difficult to guess > the correct time factor given that the VMs are hetergeneous and the IO > completion rate is not perfectly predictable. > > Worse yet, if you want to run (say) all the recoveryloop tests on one VM > (because recoveryloop is prone to crashing), it's impossible to set a > TIME_FACTOR so that each loop test gets equal runtime. That can be > hacked around with config sections, but that doesn't solve the first > problem. > > This series introduces a new configuration variable, SOAK_DURATION, that > allows test runners to control directly various long soak and looping > recovery tests. This is intended to be an alternative to TIME_FACTOR, > since that variable usually adjusts operation counts, which are > proportional to runtime but otherwise not a direct measure of time. > > With this override in place, I can configure the long soak fleet to run > for exactly as long as I want them to, and they actually hit the time > budget targets. The recoveryloop fleet now divides looping-test time > equally among the four that are in that group so that they all get ~3 > hours of coverage every night. > > There are more tests that could use this than I actually modified here, > but I've done enough to show this off as a proof of concept. > > If you're going to start using this mess, you probably ought to just > pull from my git trees, which are linked below. > > This is an extraordinary way to destroy everything. Enjoy! > Comments and questions are, as always, welcome. > > --D > > fstests git tree: > https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=soak-duration > --- > check | 14 +++++++++ > common/config | 7 ++++ > common/fuzzy | 7 ++++ > common/rc | 34 +++++++++++++++++++++ > common/report | 1 + > ltp/fsstress.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++-- > ltp/fsx.c | 50 +++++++++++++++++++++++++++++++ > src/soak_duration.awk | 23 ++++++++++++++ > tests/generic/019 | 1 + > tests/generic/388 | 2 + > tests/generic/475 | 2 + > tests/generic/476 | 7 +++- > tests/generic/482 | 5 +++ > tests/generic/521 | 1 + > tests/generic/522 | 1 + > tests/generic/642 | 1 + > tests/generic/648 | 8 +++-- > 17 files changed, 229 insertions(+), 13 deletions(-) > create mode 100644 src/soak_duration.awk > The set looks good to me (the second commit has different var name, but fine by me) Reviewed-by: Andrey Albershteyn <aalbersh@xxxxxxxxxx> -- - Andrey