Re: [PATCH] zbd: Fix unexpected job termination by open zone search failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Sep 30, 2021 at 09:02:36AM +0900, Shin'ichiro Kawasaki wrote:
> Test case #46 in t/zbd/test-zbd-support fails when it is repeated
> hundreds of times on null_blk zoned devices. The test case uses libaio
> IO engine to run 8 random write jobs on 4 sequential write required
> zones. When all of the 4 zones get almost full but still open for
> in-flight writes, the helper function zbd_convert_to_open_zone() fails
> to get an opened zone for next write. This results in unexpected job
> termination.
> 
> To avoid the unexpected job termination, retry the steps in
> zbd_convert_to_open_zone(). Before retry, call io_u_quiesce() to ensure
> that the in-flight writes get completed.
> 
> To prevent infinite loop by the retry, retry only when any IOs are
> in-flight or in-flight IOs get completed. To check in-flight IO count of
> all jobs, add a new helper function any_io_in_flight().
> 
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@xxxxxxx>
> ---
>  zbd.c | 34 ++++++++++++++++++++++++++++++++++
>  1 file changed, 34 insertions(+)
> 
> diff --git a/zbd.c b/zbd.c
> index 64415d2b..c0b0b81c 100644
> --- a/zbd.c
> +++ b/zbd.c
> @@ -1204,6 +1204,19 @@ static uint32_t pick_random_zone_idx(const struct fio_file *f,
>  		f->io_size;
>  }
>  
> +static bool any_io_in_flight(void)
> +{
> +	struct thread_data *td;
> +	int i;
> +
> +	for_each_td(td, i) {
> +		if (td->io_u_in_flight)
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
>  /*
>   * Modify the offset of an I/O unit that does not refer to an open zone such
>   * that it refers to an open zone. Close an open zone and open a new zone if
> @@ -1223,6 +1236,8 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
>  	uint32_t zone_idx, new_zone_idx;
>  	int i;
>  	bool wait_zone_close;
> +	bool in_flight;
> +	bool should_retry = true;
>  
>  	assert(is_valid_offset(f, io_u->offset));
>  
> @@ -1337,6 +1352,7 @@ open_other_zone:
>  		io_u_quiesce(td);
>  	}
>  
> +retry:
>  	/* Zone 'z' is full, so try to open a new zone. */
>  	for (i = f->io_size / zbdi->zone_size; i > 0; i--) {
>  		zone_idx++;
> @@ -1376,6 +1392,24 @@ open_other_zone:
>  			goto out;
>  		pthread_mutex_lock(&zbdi->mutex);
>  	}
> +
> +	/*
> +	 * When any I/O is in-flight or when all I/Os in-flight get completed,
> +	 * the I/Os might have closed zones then retry the steps to open a zone.
> +	 * Before retry, call io_u_quiesce() to complete in-flight writes.
> +	 */
> +	in_flight = any_io_in_flight();
> +	if (in_flight || should_retry) {
> +		dprint(FD_ZBD, "%s(%s): wait zone close and retry open zones\n",
> +		       __func__, f->file_name);
> +		pthread_mutex_unlock(&zbdi->mutex);
> +		zone_unlock(z);
> +		io_u_quiesce(td);
> +		zone_lock(td, f, z);
> +		should_retry = in_flight;
> +		goto retry;
> +	}
> +
>  	pthread_mutex_unlock(&zbdi->mutex);
>  	zone_unlock(z);
>  	dprint(FD_ZBD, "%s(%s): did not open another zone\n", __func__,
> -- 
> 2.31.1
> 

Reviewed-by: Niklas Cassel <niklas.cassel@xxxxxxx>



[Index of Archives]     [Linux Kernel]     [Linux SCSI]     [Linux IDE]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux