On 2020/03/18 10:49, Jens Axboe wrote: > On 3/16/20 8:50 PM, Naohiro Aota wrote: >> On Fri, Feb 28, 2020 at 04:12:42PM +0900, Naohiro Aota wrote: >>> With zonemode=zbd and asynchronous ioengine, a thread takes a zone lock >>> before an I/O submission (in zbd_adjust_block() or >>> zbd_convert_to_open_zone()) and releases the lock after the I/O is put (in >>> zbd_put_io()). With a small number of open zones and/or a large number of >>> jobs, threads can easily end up circular lock dependency and deadlocks. For >>> example, thread A sends an I/O to zone 0, so thread A holds a zone lock #0. >>> Then, thread A continues on zone 1 and try to acquire zone lock #1. At the >>> same time, thread B held zone lock #1, sent I/O to zone 1, and try to >>> acquire zone #0. Now, both threads are waiting for each other's lock, which >>> is never released. >>> >>> This series fixes three problems to eliminate the deadlock. First, taking >>> all the zone locks should be avoided. zbd_process_swd() and >>> zbd_reset_zones() take the lock for all zones of the specified device, >>> preventing other threads from accessing different zones in parallel. While >>> it is not the root cause of the deadlock, such all zone locking easily >>> trigger a deadlock. So, this series reduces such contentions by (1) >>> eliminating unnecessary invocation of zbd_process_swd() and (2) changing to >>> take single zone at at time in zbd_reset_zones(). >>> >>> Secondly, zbd's I/O issuing path should expect lock contention with other >>> threads and handle the case properly. Commit 6f0c608564c3 ("zbd: Avoid >>> async I/O multi-job workload deadlock") also addressed this issue by using >>> pthread_mutex_try_lock() and io_u_quiesce(). However, there are more >>> pthread_mutex_lock() left to be fixed in the same way. >>> >>> Finally, fio should clean up I/Os properly on an error case. Currently, >>> cleanup_pending_aio() and io_u_quiesce() fail to clean up I/Os with an >>> error. As a result, zone locks, which are held by an erroneous thread, are >>> kept held and blocks other threads to acquire the locks. >>> >>> This series also add a test case to cause the deadlock with unpatched fio. >>> >>> Patches 1 and 2 avoid long range lock holding to reduce lock contentions. >>> >>> Patch 3 introduces zone_lock and use it to handle the lock contention case. >>> >>> Patches 4 and 5 fix error path to clean up all the pending I/Os left. >>> >>> Patch 6 adds the test. >>> >>> Naohiro Aota (6): >>> zbd: avoid initializing swd when unnecessary >>> zbd: reset one zone at a time >>> zbd: use zone_lock to lock a zone >>> backend: always clean up pending aios >>> io_u: ensure io_u_quiesce() to process all the IOs >>> zbd: add test for stressing zone locking >>> >>> backend.c | 5 --- >>> io_u.c | 6 +-- >>> t/zbd/test-zbd-support | 30 +++++++++++++++ >>> zbd.c | 84 ++++++++++++++++-------------------------- >>> 4 files changed, 65 insertions(+), 60 deletions(-) >>> >>> -- >>> 2.25.1 >> >> Ping on this series (also on this >> https://www.spinics.net/lists/fio/msg08322.html ). > > Damien, can you take a look please? > My apologies. I should have sent a reviewed-by tag earlier. I worked on these with Naohiro (who did all the heavy lifting). Tested too and at least for SMR disks workloads, no problems detected. Reviewed-by: Damien Le Moal <damien.lemoal@xxxxxxx> Tested-by: Damien Le Moal <damien.lemoal@xxxxxxx> -- Damien Le Moal Western Digital Research