On Tue, Feb 07, 2023 at 03:37:37PM +0900, Shin'ichiro Kawasaki wrote: > The valid data bytes accounting is used for zone_reset_threshold option. > This accounting usage has two issues. The first issue is unexpected > zone reset due to different IO ranges. The valid data bytes accounting > is done for all IO ranges per device, and shared by all jobs. On the > other hand, the zone_reset_threshold option is defined as the ratio to > each job's IO range. When a job refers to the accounting value, it > includes writes to IO ranges out of the job's IO range. Then zone reset > is triggered earlier than expected. > > The second issue is accounting value initialization. The initialization > of the accounting field is repeated for each job, then the value > initialized by the first job is overwritten by other jobs. This works as > expected for single job or multiple jobs with same write range. However, > when multiple jobs have different write ranges, the overwritten value is > wrong except for the last job. > > To ensure that the accounting works as expected for the option, check > that write ranges of all jobs are same. If jobs have different write > ranges, report it as an error. Initialize the accounting field only once > for the first job. All jobs have same write range, then one time > initialization is enough. Update man page to clarify this limitation of > the option. > > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@xxxxxxx> > --- > HOWTO.rst | 3 ++- > fio.1 | 3 ++- > zbd.c | 22 +++++++++++++++++++--- > zbd.h | 4 ++++ > 4 files changed, 27 insertions(+), 5 deletions(-) > > diff --git a/HOWTO.rst b/HOWTO.rst > index d08f8a18..c71c3949 100644 > --- a/HOWTO.rst > +++ b/HOWTO.rst > @@ -1088,7 +1088,8 @@ Target file/device > A number between zero and one that indicates the ratio of written bytes > to the total size of the zones with write pointers in the IO range. When > current ratio is above this ratio, zones are reset periodically as > - :option:`zone_reset_frequency` specifies. > + :option:`zone_reset_frequency` specifies. If there are multiple jobs when > + using this option, the IO range for all write jobs shall be the same. nit: perhaps replace "shall be the same" with "has to be the same" > > .. option:: zone_reset_frequency=float > > diff --git a/fio.1 b/fio.1 > index 54d2c403..80cd23b5 100644 > --- a/fio.1 > +++ b/fio.1 > @@ -857,7 +857,8 @@ value to be larger than the device reported limit. Default: false. > A number between zero and one that indicates the ratio of written bytes to the > total size of the zones with write pointers in the IO range. When current ratio > is above this ratio, zones are reset periodically as \fBzone_reset_frequency\fR > -specifies. > +specifies. If there are multiple jobs when using this option, the IO range for > +all write jobs shall be the same. > .TP > .BI zone_reset_frequency \fR=\fPfloat > A number between zero and one that indicates how often a zone reset should be > diff --git a/zbd.c b/zbd.c > index 6783acf9..ca6816b9 100644 > --- a/zbd.c > +++ b/zbd.c > @@ -1201,10 +1201,26 @@ static uint64_t zbd_set_vdb(struct thread_data *td, const struct fio_file *f) > { > struct fio_zone_info *zb, *ze, *z; > uint64_t wp_vdb = 0; > + struct zoned_block_device_info *zbdi = f->zbd_info; > > if (!accounting_vdb(td, f)) > return 0; > > + if (zbdi->wp_write_min_zone != zbdi->wp_write_max_zone) { > + if (zbdi->wp_write_min_zone != f->min_zone || > + zbdi->wp_write_max_zone != f->max_zone) { > + td_verror(td, EINVAL, > + "multi-jobs with different write ranges are " > + "not supported with zone_reset_threshold"); > + log_err("multi-jobs with different write ranges are " > + "not supported with zone_reset_threshold\n"); > + } > + return 0; > + } This check looks a bit weird. What if the first job specifies a f->file_offset and an f->io_size that is within the same zone? Then f->min_zone and f->max_zone will have the same value. Which should cause wp_write_min_zone and wp_write_max_zone to have the same value. The check will thus not be performed for the second job, after wp_write_min_zone and wp_write_min_zone have been initialized. > + > + zbdi->wp_write_min_zone = f->min_zone; > + zbdi->wp_write_max_zone = f->max_zone; > + > zb = zbd_get_zone(f, f->min_zone); > ze = zbd_get_zone(f, f->max_zone); > for (z = zb; z < ze; z++) { > @@ -1214,9 +1230,9 @@ static uint64_t zbd_set_vdb(struct thread_data *td, const struct fio_file *f) > } > } > > - pthread_mutex_lock(&f->zbd_info->mutex); > - f->zbd_info->wp_valid_data_bytes = wp_vdb; > - pthread_mutex_unlock(&f->zbd_info->mutex); > + pthread_mutex_lock(&zbdi->mutex); > + zbdi->wp_valid_data_bytes = wp_vdb; > + pthread_mutex_unlock(&zbdi->mutex); > > for (z = zb; z < ze; z++) > if (z->has_wp) > diff --git a/zbd.h b/zbd.h > index 20b2fe17..d4f81325 100644 > --- a/zbd.h > +++ b/zbd.h > @@ -55,6 +55,8 @@ struct fio_zone_info { > * num_open_zones). > * @zone_size: size of a single zone in bytes. > * @wp_valid_data_bytes: total size of data in zones with write pointers > + * @wp_write_min_zone: Minimum zone index of all job's write ranges. Inclusive. > + * @wp_write_max_zone: Maximum zone index of all job's write ranges. Exclusive. > * @zone_size_log2: log2 of the zone size in bytes if it is a power of 2 or 0 > * if the zone size is not a power of 2. > * @nr_zones: number of zones > @@ -75,6 +77,8 @@ struct zoned_block_device_info { > pthread_mutex_t mutex; > uint64_t zone_size; > uint64_t wp_valid_data_bytes; > + uint32_t wp_write_min_zone; > + uint32_t wp_write_max_zone; > uint32_t zone_size_log2; > uint32_t nr_zones; > uint32_t refcount; > -- > 2.38.1 >