On Thu, Jan 25, 2024 at 02:10:37PM -0800, Luis Chamberlain wrote: > The kdevops test runner has supported a custom SOAK_DURATION for > fstests, however we were not providing any guidance. This means folks > likely disable this. Throw a bone and provide some basic guidance and > use 2.5 hours as the default value. There are about 46 tests today > which use soak duration, this means if you are testing serially it > increase total test time by about 5 days than the previously known > total test time. > > Note that if you are using kernel-ci and using a max loop goal of 100 > that means 500 days extra, so about 1.3 years extra total test time. > If enabling soak duration you may want to then re-evaluate your loop > target goal for kernel-ci for kdevops. Yikes, I wouldn't combine multiple runs with large SOAK_DURATION. ;) You all might consider kicking all the soak tests over to a separate VM or VMs so that the long soak test do not hold up the rest of the run. > Signed-off-by: Luis Chamberlain <mcgrof@xxxxxxxxxx> > --- > > Chandan, Amir, lemme know what you think of a default 2.5 hours default > if soak duration is enabled. The only thing is the math indicates that > if you are going to enable kernel-ci we won't finish this year. > > To be clear, we've picked up testing with soak duration seriously for > our LBS testing. It is why we've been able to find pretty hard to > reproduce issues even on the page cache for the baseline [0], ie, without > LBS. While folks have seemed to have found value in adopting 2.5 hours > and of the results we have found, it obviously means a scaling issue > to consider to decide when we're done with testing our baseline. > > At first I wrote this patch just to provide basic guidance for kdevops, > but after doing a bit of the math on how it also extends total test > time, *with* our kernel-ci effort, it reveals clearly we should probably > reconsider lowering the kernel-ci threshold a bit if adopting soak > duration. > > CC'ing a bit wider audience so to get a bit better idea of what folks > might consider a sensible value for your own testing too. From what > we've been observing, SOAK_DURATION allows us to catch bugs faster than > just increasing the kernel-ci count, however, using both let's us catch > even more bugs too. > > To help *reduce* the amount of time to test we've deployed many kdevops > XFS clusters to help test the baseline. This is why our count time on > kernel-ci no is about 50-60 with a soak duration of about 2.5 hours. > > Also please not that the reported bugs so far are the ones with crashes, > there are other failures too, but we just haven't had the time to disect > and report failures which are non-fatal (crashes) as crashes have been > our priority. > > [0] https://github.com/linux-kdevops/kdevops/blob/master/docs/xfs-bugs.md > > playbooks/roles/fstests/defaults/main.yml | 3 + > workflows/fstests/Kconfig | 89 ++++++++++++++++++++--- > workflows/fstests/Makefile.sparsefiles | 4 + > 3 files changed, 87 insertions(+), 9 deletions(-) > > diff --git a/playbooks/roles/fstests/defaults/main.yml b/playbooks/roles/fstests/defaults/main.yml > index 2f70f9549cde..4a1f5dec5827 100644 > --- a/playbooks/roles/fstests/defaults/main.yml > +++ b/playbooks/roles/fstests/defaults/main.yml > @@ -30,6 +30,9 @@ fstests_test_logdev_mkfs_opts: "/dev/null" > fstests_test_dev_zns: "/dev/null" > fstests_zns_enabled: False > > +fstests_soak_duration_enable: False > +fstests_soak_duration: 0 > + > fstests_uses_no_devices: False > fstests_generate_simple_config_enable: False > fstests_generate_nvme_live_config_enable: False > diff --git a/workflows/fstests/Kconfig b/workflows/fstests/Kconfig > index 985a7847b6c7..bbd8927b3cd3 100644 > --- a/workflows/fstests/Kconfig > +++ b/workflows/fstests/Kconfig > @@ -760,15 +760,23 @@ config FSTESTS_RUN_LARGE_DISK_TESTS > to run. The "large disk" requirement is test dependent, but > typically, it means a disk with capacity of at several 10G. > > -config FSTESTS_SOAK_DURATION > - int "Custom Soak duration to be used" > - default 0 > +config FSTESTS_ENABLE_SOAK_DURATION > + bool "Enable custom soak duration time" > help > - Custom Soak duration to be used during test execution. If you set this > - to a non-zero value then fstests will increase the amount of time it > - takes to run certain tests which are time based and support using > - SOAK_DURATION. A moderate high value setting for this is 9900 which is > - 2.5 hours. > + Enable soak duration to be used during test execution. If you are not > + interested in extending your testing then leave this disabled. > + > + Using a custom soak duration to a non-zero value then fstests will > + increase the amount of time it takes to run certain tests which are > + time based and support using SOAK_DURATION. A moderate high value > + setting for this is 9900 which is 2.5 hours. "A moderately high setting for this is "2.5h" for 2.5 hours." FWIW, the ./check parser translates floating point numbers with suffixes to integer seconds; see soak_duration.awk. The part I don't know is if kdevops merely passes through the value as a string; or actually treats this as an integer. If the latter, then please ignore my comment. > + > + Note that we have 46 tests today which will be able to use soak > + duration if set. This means your test time will increase by the > + soak duration * these number of tests. When soak duration is > + enabled the test specific watchdog fstests_watchdog.py will be > + aware of tests which require soak duration and consider before > + reporting a possible hang. > > As of 2023-10-31 that consists of the following tests which use either > fsstress or fsx or fio. Tests either use SOAK_DURATION directly or they > @@ -786,7 +794,7 @@ config FSTESTS_SOAK_DURATION > - generic/648 - fsstress + disk failures on loopback > - generic/650 - fsstress - multithreaded write + CPU hotplug > > - The tests below use _scratch_xfs_stress_scrub() to stress > + All the tests below use _scratch_xfs_stress_scrub() to stress > test an with fsstress with scrub or an alternate xfs_db operation. > > - xfs/285 > @@ -825,4 +833,67 @@ config FSTESTS_SOAK_DURATION > - xfs/729 > - xfs/800 > > +if FSTESTS_ENABLE_SOAK_DURATION > + > +choice > + prompt "Soak duration value to use" > + default FSTESTS_SOAK_DURATION_HIGH > + > +config FSTESTS_SOAK_DURATION_CUSTOM > + bool "Custom" > + help > + You want to specify the value yourself. > + > +config FSTESTS_SOAK_DURATION_PATHALOGICAL "PATHOLOGICAL", and yes that high a setting is pathological. ;) (Unless you're allocating one soak-fstest per VM in which case "1w" might be appropriate.) > + bool "High (48 hours)" > + help > + Use 48 hours for soak duration. > + > + Using this with 46 tests known to use soak duration means your test > + time will increase by about 92 days, or a bit over 3 months if run > + serially. > + > +config FSTESTS_SOAK_DURATION_HIGH > + bool "High (2.5 hours)" > + help > + Use 2.5 hours for soak duration. > + > + Using this with 46 tests known to use soak duration means your test > + time will increase by about 5 days if run serially. > + > +config FSTESTS_SOAK_DURATION_MID > + bool "Mid (1 hour)" > + help > + Use 1 hour for soak duration. > + > + Using this with 46 tests known to use soak duration means your test > + time will increase by about 2 days if run serially. I wonder, is there any way to scan the number of soak test to generate these figures automatically at configure time? I'd guess no, since kdevops kconfig comes before pulling and compiling, right? --D > + > +config FSTESTS_SOAK_DURATION_LOW > + bool "Low (30 minutes) > + help > + Use 30 minutes for soak duration. > + > + Using this with 46 tests known to use soak duration means your test > + time will increase by about 1 day if run serially. > + > +endchoice > + > +config FSTESTS_SOAK_DURATION_CUSTOM_VAL > + int "Custom soak duration value (seconds)" > + default 0 > + depends on FSTESTS_SOAK_DURATION_CUSTOM > + help > + Enter your custom soak duration value in seconds. > + > +endif # FSTESTS_ENABLE_SOAK_DURATION > + > +config FSTESTS_SOAK_DURATION > + default 0 if !FSTESTS_ENABLE_SOAK_DURATION > + default FSTESTS_SOAK_DURATION_CUSTOM_VAL if FSTESTS_SOAK_DURATION_CUSTOM > + default 1800 if FSTESTS_SOAK_DURATION_LOW > + default 3600 if FSTESTS_SOAK_DURATION_MID > + default 9900 if FSTESTS_SOAK_DURATION_HIGH > + default 172800 if FSTESTS_SOAK_DURATION_PATHALOGICAL > + > endif # KDEVOPS_WORKFLOW_ENABLE_FSTESTS > diff --git a/workflows/fstests/Makefile.sparsefiles b/workflows/fstests/Makefile.sparsefiles > index c5ca20a9c462..7dd129c4f9cc 100644 > --- a/workflows/fstests/Makefile.sparsefiles > +++ b/workflows/fstests/Makefile.sparsefiles > @@ -44,6 +44,10 @@ FSTESTS_ARGS += run_large_disk_tests='$(FSTESTS_RUN_LARGE_DISK_TESTS)' > FSTESTS_ARGS += run_auto_group_tests='$(FSTESTS_RUN_AUTO_GROUP_TESTS)' > FSTESTS_ARGS += run_custom_group_tests='$(FSTESTS_RUN_CUSTOM_GROUP_TESTS)' > FSTESTS_ARGS += exclude_test_groups='$(CONFIG_FSTESTS_EXCLUDE_TEST_GROUPS)' > + > +ifeq (y,$(CONFIG_FSTESTS_ENABLE_SOAK_DURATION)) > +FSTESTS_ARGS += fstests_soak_duration_enable='True' > +endif > FSTESTS_ARGS += fstests_soak_duration='$(CONFIG_FSTESTS_SOAK_DURATION)' > > ifeq (y,$(CONFIG_FSTESTS_ENABLE_RUN_CUSTOM_TESTS)) > -- > 2.42.0 > >