[PATCH 7/8] zbd: Avoid async I/O multi-job workload deadlock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



With zonemode=zbd, for a multi-job workload using asynchronous I/O
engines with a deep I/O queue depth setting, a job that is building a
batch of asynchronous I/Os to submit may end up waiting for an I/O
target zone lock held by another job that is also preparing a batch.
For small devices with few zones and/or a large number of jobs, such
prepare phase zone lock contention can be frequent enough to end up in a
situation where all jobs are waiting for zone locks held by other jobs
and no I/O being executed (so no zone being unlocked).

Avoid this situation by using pthread_mutex_trylock() instead of
pthread_mutex_lock() and by calling io_u_quiesce() to execute queued
I/O units if locking fails. pthread_mutex_lock() is then called to
lock the desired target zone. The execution of io_u_quiesce() forces
I/O execution progress and so zones to be unlocked, avoiding job
deadlock.

Signed-off-by: Damien Le Moal <damien.lemoal@xxxxxxx>
---
 zbd.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/zbd.c b/zbd.c
index 310b1732..2da742b7 100644
--- a/zbd.c
+++ b/zbd.c
@@ -1255,7 +1255,21 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 
 	zbd_check_swd(f);
 
-	pthread_mutex_lock(&zb->mutex);
+	/*
+	 * Lock the io_u target zone. The zone will be unlocked if io_u offset
+	 * is changed or when io_u completes and zbd_put_io() executed.
+	 * To avoid multiple jobs doing asynchronous I/Os from deadlocking each
+	 * other waiting for zone locks when building an io_u batch, first
+	 * only trylock the zone. If the zone is already locked by another job,
+	 * process the currently queued I/Os so that I/O progress is made and
+	 * zones unlocked.
+	 */
+	if (pthread_mutex_trylock(&zb->mutex) != 0) {
+		if (!td_ioengine_flagged(td, FIO_SYNCIO))
+			io_u_quiesce(td);
+		pthread_mutex_lock(&zb->mutex);
+	}
+
 	switch (io_u->ddir) {
 	case DDIR_READ:
 		if (td->runstate == TD_VERIFYING) {
-- 
2.20.1




[Index of Archives]     [Linux Kernel]     [Linux SCSI]     [Linux IDE]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux