Alexey, On 2020/05/02 3:52, Alexey Dobriyan wrote: > On Fri, May 01, 2020 at 01:34:32AM +0000, Damien Le Moal wrote: >> On 2020/04/30 21:41, Alexey Dobriyan wrote: >>> It is not possible to maintain equal per-thread iodepth. The way code >>> is written, "max_open_zones" acts as a global limit, and one thread >>> opens all "max_open_zones" for itself and others starve for available >>> zones and _exit_ prematurely. >>> >>> This config is guaranteed to make equal number of zone resets/IO now: >>> each thread generates identical pattern and doesn't intersect with other >>> threads: >>> >>> zonemode=zbd >>> zonesize=... >>> rw=write >>> >>> numjobs=N >>> offset_increment=M*zonesize >>> >>> [j] >>> size=M*zonesize >>> >>> Patch introduces "global_max_open_zones" which is per-device config >>> option. "max_open_zones" becomes per-thread limit. Both limits are >>> checked for each open zone so one thread can't starve others. >> >> It makes sense. Nice one. >> >> But the change as is will break existing test scripts (e.g. lots of SMR drives >> are being tested with this). > > It won't break single-threaded ones, that's for sure. Yes, but things like: fio --ioengine=psync --rw=randwr --max_open_zones=128 --numjobs=32 will change behavior. With your change, instead of 32 threads writing randomly to a total of 128 zones, you will get 32 threads each writing randomly to 128 zones, with a total of 32*128=4096 zones. SMR drives and zonemode=zbd have now been around for a while and there are a lot of fio scripts deployed in production for system validation/tests, as well as in drive development for testing. If we can avoid breaking that, we absolutely must. My proposal to keep max_open_zones as the per device maximum and introducing a thread_max_open_zones limit keeps backward compatibility with existing scripts while still allowing your change. > >> I think we can avoid this breakage simply: leave >> max_open_zones option definition as is and add "job_max_open_zones" or >> "thread_max_open_zones" option (no strong feelings about the name here, as long >> as it is explicit) to define the per thread maximum number of open zones. This >> new option could actually default to max_open_zones / numjobs if that is not 0. > > I'd argue that such scripts are broken. See the above example. It is a perfectly valid script, not broken at all. Varying the number of max_open_zones allows measuring the performance variation of a drive with the number of implicitly open zones. It is a common one that I have seen a lot in drive development and production. There are likely other valid ones too. Assuming that all current uses of max_open_zones with multi-jobs workloads are broken would be a mistake. > > If sustained numjobs*max_open_zones QD is desired than it is not > guaranteed as threads will simply exit at indeterminate times, > which break LBA space coverage as well. > > Right now, numjobs= + max_open_zones= means "max open zones by at most > "numjobs" threads. I understand that. And we should keep it that way for the reasons mentioned above. Modifying your change with the option thread_max_open_zones will nicely enhance. E.g. fio --ioengine=libaio --iodepth=8 --rw=randwr --thread_max_open_zones=1 --numjobs=8 Will result in 8 threads writing a single randomly chosen zone at QD=8. And that is the same as your proposed: fio --ioengine=libaio --iodepth=8 --rw=randwr --max_open_zones=1 --numjobs=8 but without breaking the existing meaning of max_open_zones as a per drive/file limit. I totally agree with your change. It is a nice one. But let's preserve max_open_zones meaning as the per device limit. No need to change it. Best regards. -- Damien Le Moal Western Digital Research