When zonemode is zbd and block size is not divisor of zone size, write target zone selection does not work as expected. When the write is random write and the device has max open zone limit, the random write is repeated to the zones selected up to the max open zone limit. All writes are repeated only to the zones. When the write is sequential write, write is done only for the first zone. The cause of such unexpected zone selection is current write target zone selection logic. It selects write target zones within open zones. When block size is not divisor of zone size, the selected open zone has only remainder of writable blocks smaller than the block size. Fio reset such zone after zone selection and continue writing to it. This zone reset is required not to exceed the limit of max_open_zones option or max_active_zone limit of the zoned device, but it does not simulate the workload. To avoid the zone reset and unexpected write to same zone, fix write target zone handling of zones with remainder smaller than write block size. Do not reset but finish such zone not to exceed the max_open_zones option and max_active_zone limit. Then choose the next to the finished zone as the write target. To implement this, add helper functions zbd_zone_remainder_not_writable() and zbd_finish_zone(). Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@xxxxxxx> --- zbd.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 78 insertions(+) diff --git a/zbd.c b/zbd.c index 627fb968..55c5c751 100644 --- a/zbd.c +++ b/zbd.c @@ -70,6 +70,15 @@ static inline uint64_t zbd_zone_capacity_end(const struct fio_zone_info *z) return z->start + z->capacity; } +/** + * zbd_zone_remainder - Return bytes can be written to the zone. + * @z: zone info pointer. + */ +static inline uint64_t zbd_zone_remainder(struct fio_zone_info *z) +{ + return zbd_zone_capacity_end(z) - z->wp; +} + /** * zbd_zone_full - verify whether a minimum number of bytes remain in a zone * @f: file pointer. @@ -322,6 +331,44 @@ static void zbd_close_zone(struct thread_data *td, const struct fio_file *f, z->open = 0; } +/** + * zbd_finish_zone - finish the specified zone + * @td: FIO thread data. + * @f: FIO file for which to finish a zone + * @z: Zone to finish. + * + * Finish the zone at @offset with open or close status. + */ +static int zbd_finish_zone(struct thread_data *td, struct fio_file *f, + struct fio_zone_info *z) +{ + uint64_t offset = z->start; + uint64_t length = f->zbd_info->zone_size; + int ret = 0; + + switch (f->zbd_info->model) { + case ZBD_HOST_AWARE: + case ZBD_HOST_MANAGED: + if (td->io_ops && td->io_ops->finish_zone) + ret = td->io_ops->finish_zone(td, f, offset, length); + else + ret = blkzoned_finish_zone(td, f, offset, length); + break; + default: + break; + } + + if (ret < 0) { + td_verror(td, errno, "finish zone failed"); + log_err("%s: finish zone at sector %"PRIu64" failed (%d).\n", + f->file_name, offset >> 9, errno); + } else { + z->wp = (z+1)->start; + } + + return ret; +} + /** * zbd_reset_zones - Reset a range of zones. * @td: fio thread data. @@ -1941,6 +1988,33 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u) goto eof; } +retry: + if (zbd_zone_remainder(zb) > 0 && + zbd_zone_remainder(zb) < min_bs) { + pthread_mutex_lock(&f->zbd_info->mutex); + zbd_close_zone(td, f, zb); + pthread_mutex_unlock(&f->zbd_info->mutex); + dprint(FD_ZBD, + "%s: finish zone %d\n", + f->file_name, zbd_zone_idx(f, zb)); + io_u_quiesce(td); + zbd_finish_zone(td, f, zb); + if (zbd_zone_idx(f, zb) + 1 >= f->max_zone) { + if (!td_random(td)) + goto eof; + } + zone_unlock(zb); + + /* choose next zone with write pointer */ + do { + zb++; + if (zbd_zone_idx(f, zb) >= f->max_zone) + zb = zbd_get_zone(f, f->min_zone); + } while (!zb->has_wp); + + zone_lock(td, f, zb); + } + if (!zbd_open_zone(td, f, zb)) { zone_unlock(zb); zb = zbd_convert_to_open_zone(td, io_u); @@ -1951,6 +2025,10 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u) } } + if (zbd_zone_remainder(zb) > 0 && + zbd_zone_remainder(zb) < min_bs) + goto retry; + /* Check whether the zone reset threshold has been exceeded */ if (td->o.zrf.u.f) { if (zbdi->wp_sectors_with_data >= f->io_size * td->o.zrt.u.f && -- 2.37.1