With zonemode=zbd, for a multi-job workload using asynchronous I/O engines with a deep I/O queue depth setting, a job that is building a batch of asynchronous I/Os to submit may end up waiting for an I/O target zone lock held by another job that is also preparing a batch. For small devices with few zones and/or a large number of jobs, such prepare phase zone lock contention can be frequent enough to end up in a situation where all jobs are waiting for zone locks held by other jobs and no I/O being executed (so no zone being unlocked). Avoid this situation by using pthread_mutex_trylock() instead of pthread_mutex_lock() and by calling io_u_quiesce() to execute queued I/O units if locking fails. pthread_mutex_lock() is then called to lock the desired target zone. The execution of io_u_quiesce() forces I/O execution progress and so zones to be unlocked, avoiding job deadlock. Signed-off-by: Damien Le Moal <damien.lemoal@xxxxxxx> --- zbd.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/zbd.c b/zbd.c index 310b1732..2da742b7 100644 --- a/zbd.c +++ b/zbd.c @@ -1255,7 +1255,21 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u) zbd_check_swd(f); - pthread_mutex_lock(&zb->mutex); + /* + * Lock the io_u target zone. The zone will be unlocked if io_u offset + * is changed or when io_u completes and zbd_put_io() executed. + * To avoid multiple jobs doing asynchronous I/Os from deadlocking each + * other waiting for zone locks when building an io_u batch, first + * only trylock the zone. If the zone is already locked by another job, + * process the currently queued I/Os so that I/O progress is made and + * zones unlocked. + */ + if (pthread_mutex_trylock(&zb->mutex) != 0) { + if (!td_ioengine_flagged(td, FIO_SYNCIO)) + io_u_quiesce(td); + pthread_mutex_lock(&zb->mutex); + } + switch (io_u->ddir) { case DDIR_READ: if (td->runstate == TD_VERIFYING) { -- 2.20.1