On Thu, Sep 30, 2021 at 09:02:36AM +0900, Shin'ichiro Kawasaki wrote: > Test case #46 in t/zbd/test-zbd-support fails when it is repeated > hundreds of times on null_blk zoned devices. The test case uses libaio > IO engine to run 8 random write jobs on 4 sequential write required > zones. When all of the 4 zones get almost full but still open for > in-flight writes, the helper function zbd_convert_to_open_zone() fails > to get an opened zone for next write. This results in unexpected job > termination. > > To avoid the unexpected job termination, retry the steps in > zbd_convert_to_open_zone(). Before retry, call io_u_quiesce() to ensure > that the in-flight writes get completed. > > To prevent infinite loop by the retry, retry only when any IOs are > in-flight or in-flight IOs get completed. To check in-flight IO count of > all jobs, add a new helper function any_io_in_flight(). > > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@xxxxxxx> > --- > zbd.c | 34 ++++++++++++++++++++++++++++++++++ > 1 file changed, 34 insertions(+) > > diff --git a/zbd.c b/zbd.c > index 64415d2b..c0b0b81c 100644 > --- a/zbd.c > +++ b/zbd.c > @@ -1204,6 +1204,19 @@ static uint32_t pick_random_zone_idx(const struct fio_file *f, > f->io_size; > } > > +static bool any_io_in_flight(void) > +{ > + struct thread_data *td; > + int i; > + > + for_each_td(td, i) { > + if (td->io_u_in_flight) > + return true; > + } > + > + return false; > +} > + > /* > * Modify the offset of an I/O unit that does not refer to an open zone such > * that it refers to an open zone. Close an open zone and open a new zone if > @@ -1223,6 +1236,8 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td, > uint32_t zone_idx, new_zone_idx; > int i; > bool wait_zone_close; > + bool in_flight; > + bool should_retry = true; > > assert(is_valid_offset(f, io_u->offset)); > > @@ -1337,6 +1352,7 @@ open_other_zone: > io_u_quiesce(td); > } > > +retry: > /* Zone 'z' is full, so try to open a new zone. */ > for (i = f->io_size / zbdi->zone_size; i > 0; i--) { > zone_idx++; > @@ -1376,6 +1392,24 @@ open_other_zone: > goto out; > pthread_mutex_lock(&zbdi->mutex); > } > + > + /* > + * When any I/O is in-flight or when all I/Os in-flight get completed, > + * the I/Os might have closed zones then retry the steps to open a zone. > + * Before retry, call io_u_quiesce() to complete in-flight writes. > + */ > + in_flight = any_io_in_flight(); > + if (in_flight || should_retry) { > + dprint(FD_ZBD, "%s(%s): wait zone close and retry open zones\n", > + __func__, f->file_name); > + pthread_mutex_unlock(&zbdi->mutex); > + zone_unlock(z); > + io_u_quiesce(td); > + zone_lock(td, f, z); > + should_retry = in_flight; > + goto retry; > + } > + > pthread_mutex_unlock(&zbdi->mutex); > zone_unlock(z); > dprint(FD_ZBD, "%s(%s): did not open another zone\n", __func__, > -- > 2.31.1 > Reviewed-by: Niklas Cassel <niklas.cassel@xxxxxxx>