Same random number generation across iterations of do_io?

Adam Horshack <horshack@xxxxxxxx> · Wed, 15 Feb 2023 14:43:32 +0000

I'm working on a fix for the issue described here:

https://github.com/axboe/fio/issues/1517#issuecomment-1430282533

In short, when using randrepeat=0 and a configuration that enables verify and has multiple workload iterations (loops=x or time_based), the second iteration of do_verify() fails with header-random-seed miscompares because the second iteration of do_io() starts with the verify seed left off from the first iteration (by design when randrepeat=0), whereas do_verify() always resets the verify seed back to the init-time value (ie, to the value of the first iteration of do_io and do_verify).

IMO the best fix would be to have do_verify() reset the seed back to the value when do_io() last ran rather than back to its job-init value. That allows us to keep the seed verification enabled for the mixed read-write workloads while still having per-iteration-unique generation for randrepeat=0. The alternative is to disable seed checking for mixed worklods, the same as fio currently does for write-only workloads and for verify_backlog workloads:

https://github.com/axboe/fio/blob/1bd16cf9c113fcf9d49cae07da50e8a5c7a784ee/verify.c#L920-L925

Putting this aside, I have a more general question. Currently, when randrepeat=1, each iteration of do_io() generates the same random values for its work, including offsets, lens, verify seeds, etc... This makes sense for the first iteration of do_io since the stated purpose of randrepeat is "so the pattern is repeatable across runs." But does this make sense for the second, third, etc... iterations of do_io for configs that enable multiple iterations like time_based and loops=x? Presently coded, each iteration will do the exact same work as the first iteration, which means these additional iterations aren't actually providing any additional test coverage vs the first iteration. It's not clear the user expects this behavior from randrepeat=1; in my reading, the user would expect each run to produce the same workloads, not all iterations within the same run. Otherwise is there really a benefit to time_based or loops=x in terms of test coverage?

Adam