On Feb 07, 2023 / 14:06, Niklas Cassel wrote: > On Tue, Feb 07, 2023 at 03:37:37PM +0900, Shin'ichiro Kawasaki wrote: > > The valid data bytes accounting is used for zone_reset_threshold option. > > This accounting usage has two issues. The first issue is unexpected > > zone reset due to different IO ranges. The valid data bytes accounting > > is done for all IO ranges per device, and shared by all jobs. On the > > other hand, the zone_reset_threshold option is defined as the ratio to > > each job's IO range. When a job refers to the accounting value, it > > includes writes to IO ranges out of the job's IO range. Then zone reset > > is triggered earlier than expected. > > > > The second issue is accounting value initialization. The initialization > > of the accounting field is repeated for each job, then the value > > initialized by the first job is overwritten by other jobs. This works as > > expected for single job or multiple jobs with same write range. However, > > when multiple jobs have different write ranges, the overwritten value is > > wrong except for the last job. > > > > To ensure that the accounting works as expected for the option, check > > that write ranges of all jobs are same. If jobs have different write > > ranges, report it as an error. Initialize the accounting field only once > > for the first job. All jobs have same write range, then one time > > initialization is enough. Update man page to clarify this limitation of > > the option. > > > > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@xxxxxxx> > > --- > > HOWTO.rst | 3 ++- > > fio.1 | 3 ++- > > zbd.c | 22 +++++++++++++++++++--- > > zbd.h | 4 ++++ > > 4 files changed, 27 insertions(+), 5 deletions(-) > > > > diff --git a/HOWTO.rst b/HOWTO.rst > > index d08f8a18..c71c3949 100644 > > --- a/HOWTO.rst > > +++ b/HOWTO.rst > > @@ -1088,7 +1088,8 @@ Target file/device > > A number between zero and one that indicates the ratio of written bytes > > to the total size of the zones with write pointers in the IO range. When > > current ratio is above this ratio, zones are reset periodically as > > - :option:`zone_reset_frequency` specifies. > > + :option:`zone_reset_frequency` specifies. If there are multiple jobs when > > + using this option, the IO range for all write jobs shall be the same. > > nit: perhaps replace "shall be the same" with "has to be the same" Will replace. I find "has to" is more consitent in the document than "shall". > > > > > .. option:: zone_reset_frequency=float > > > > diff --git a/fio.1 b/fio.1 > > index 54d2c403..80cd23b5 100644 > > --- a/fio.1 > > +++ b/fio.1 > > @@ -857,7 +857,8 @@ value to be larger than the device reported limit. Default: false. > > A number between zero and one that indicates the ratio of written bytes to the > > total size of the zones with write pointers in the IO range. When current ratio > > is above this ratio, zones are reset periodically as \fBzone_reset_frequency\fR > > -specifies. > > +specifies. If there are multiple jobs when using this option, the IO range for > > +all write jobs shall be the same. > > .TP > > .BI zone_reset_frequency \fR=\fPfloat > > A number between zero and one that indicates how often a zone reset should be > > diff --git a/zbd.c b/zbd.c > > index 6783acf9..ca6816b9 100644 > > --- a/zbd.c > > +++ b/zbd.c > > @@ -1201,10 +1201,26 @@ static uint64_t zbd_set_vdb(struct thread_data *td, const struct fio_file *f) > > { > > struct fio_zone_info *zb, *ze, *z; > > uint64_t wp_vdb = 0; > > + struct zoned_block_device_info *zbdi = f->zbd_info; > > > > if (!accounting_vdb(td, f)) > > return 0; > > > > + if (zbdi->wp_write_min_zone != zbdi->wp_write_max_zone) { > > + if (zbdi->wp_write_min_zone != f->min_zone || > > + zbdi->wp_write_max_zone != f->max_zone) { > > + td_verror(td, EINVAL, > > + "multi-jobs with different write ranges are " > > + "not supported with zone_reset_threshold"); > > + log_err("multi-jobs with different write ranges are " > > + "not supported with zone_reset_threshold\n"); > > + } > > + return 0; > > + } > > This check looks a bit weird. > What if the first job specifies a f->file_offset and an f->io_size that is > within the same zone? Then f->min_zone and f->max_zone will have the same value. > > Which should cause wp_write_min_zone and wp_write_max_zone to have the same > value. > > The check will thus not be performed for the second job, after wp_write_min_zone > and wp_write_min_zone have been initialized. Thanks for catching this. In most cases, f->min_zone and f->max_zone have different values, since f->io_size must be equal to or larger than zone size. However, still f->min_zone and f->max_zone can be same when the IO range fits within a conventional zone. The commit f952800a0b1a is a reference. I suggest following 3 more changes below: 1) Before IO range check in zbd_set_vdb(), ensure that the IO range includes zones with write pointer. This also ensures that f->min_zone and f->max_zone are different. The code diff will be as follows: diff --git a/zbd.c b/zbd.c index a464d6df..f0a0856b 100644 --- a/zbd.c +++ b/zbd.c @@ -1086,6 +1086,13 @@ static uint64_t zbd_set_vdb(struct thread_data *td, const struct fio_file *f) if (!accounting_vdb(td, f)) return 0; + /* + * Ensure that all zones in the I/O range are not conventional so that + * f->min_zone and f->max_zone have different values. + */ + if (!zbd_is_seq_job(f)) + return 0; + if (zbdi->wp_write_min_zone != zbdi->wp_write_max_zone) { if (zbdi->wp_write_min_zone != f->min_zone || zbdi->wp_write_max_zone != f->max_zone) { 2) The names of new fields wp_write_min_zone and wp_write_max_zone are not accurate, since the IO range may cover both coventional zones and zones with write pointers. I will rename them to write_min_zone and write_max_zone without the wp_ prefix. 3) The 4th patch needs modification also. The description of the zone_reset_threshold option should cover the case that IO range has both conventional zones and zones with write pointers. It will be like this: .BI zone_reset_threshold \fR=\fPfloat A number between zero and one that indicates the ratio of written bytes in the zones with write pointers in the IO range to the size of the IO range. -- Shin'ichiro Kawasaki