Re: [PATCH] zbd: Fix unexpected job termination by open zone search failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2021-09-30 at 09:02 +0900, Shin'ichiro Kawasaki wrote:
> Test case #46 in t/zbd/test-zbd-support fails when it is repeated
> hundreds of times on null_blk zoned devices. The test case uses libaio
> IO engine to run 8 random write jobs on 4 sequential write required
> zones. When all of the 4 zones get almost full but still open for
> in-flight writes, the helper function zbd_convert_to_open_zone() fails
> to get an opened zone for next write. This results in unexpected job
> termination.
> 
> To avoid the unexpected job termination, retry the steps in
> zbd_convert_to_open_zone(). Before retry, call io_u_quiesce() to ensure
> that the in-flight writes get completed.
> 
> To prevent infinite loop by the retry, retry only when any IOs are
> in-flight or in-flight IOs get completed. To check in-flight IO count
> of
> all jobs, add a new helper function any_io_in_flight().
> 
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@xxxxxxx>

Looks good,
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@xxxxxxx>

> ---
>  zbd.c | 34 ++++++++++++++++++++++++++++++++++
>  1 file changed, 34 insertions(+)
> 
> diff --git a/zbd.c b/zbd.c
> index 64415d2b..c0b0b81c 100644
> --- a/zbd.c
> +++ b/zbd.c
> @@ -1204,6 +1204,19 @@ static uint32_t pick_random_zone_idx(const
> struct fio_file *f,
>                 f->io_size;
>  }
>  
> +static bool any_io_in_flight(void)
> +{
> +       struct thread_data *td;
> +       int i;
> +
> +       for_each_td(td, i) {
> +               if (td->io_u_in_flight)
> +                       return true;
> +       }
> +
> +       return false;
> +}
> +
>  /*
>   * Modify the offset of an I/O unit that does not refer to an open
> zone such
>   * that it refers to an open zone. Close an open zone and open a new
> zone if
> @@ -1223,6 +1236,8 @@ static struct fio_zone_info
> *zbd_convert_to_open_zone(struct thread_data *td,
>         uint32_t zone_idx, new_zone_idx;
>         int i;
>         bool wait_zone_close;
> +       bool in_flight;
> +       bool should_retry = true;
>  
>         assert(is_valid_offset(f, io_u->offset));
>  
> @@ -1337,6 +1352,7 @@ open_other_zone:
>                 io_u_quiesce(td);
>         }
>  
> +retry:
>         /* Zone 'z' is full, so try to open a new zone. */
>         for (i = f->io_size / zbdi->zone_size; i > 0; i--) {
>                 zone_idx++;
> @@ -1376,6 +1392,24 @@ open_other_zone:
>                         goto out;
>                 pthread_mutex_lock(&zbdi->mutex);
>         }
> +
> +       /*
> +        * When any I/O is in-flight or when all I/Os in-flight get
> completed,
> +        * the I/Os might have closed zones then retry the steps to
> open a zone.
> +        * Before retry, call io_u_quiesce() to complete in-flight
> writes.
> +        */
> +       in_flight = any_io_in_flight();
> +       if (in_flight || should_retry) {
> +               dprint(FD_ZBD, "%s(%s): wait zone close and retry open
> zones\n",
> +                      __func__, f->file_name);
> +               pthread_mutex_unlock(&zbdi->mutex);
> +               zone_unlock(z);
> +               io_u_quiesce(td);
> +               zone_lock(td, f, z);
> +               should_retry = in_flight;
> +               goto retry;
> +       }
> +
>         pthread_mutex_unlock(&zbdi->mutex);
>         zone_unlock(z);
>         dprint(FD_ZBD, "%s(%s): did not open another zone\n", __func__,





[Index of Archives]     [Linux Kernel]     [Linux SCSI]     [Linux IDE]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux