On 1/26/21 6:52 PM, Joseph Qi wrote: > > > On 1/27/21 7:35 AM, Pavel Begunkov wrote: >> Joseph reports following deadlock: >> >> CPU0: >> ... >> io_kill_linked_timeout // &ctx->completion_lock >> io_commit_cqring >> __io_queue_deferred >> __io_queue_async_work >> io_wq_enqueue >> io_wqe_enqueue // &wqe->lock >> >> CPU1: >> ... >> __io_uring_files_cancel >> io_wq_cancel_cb >> io_wqe_cancel_pending_work // &wqe->lock >> io_cancel_task_cb // &ctx->completion_lock >> >> Only __io_queue_deferred() calls queue_async_work() while holding >> ctx->completion_lock, enqueue drained requests via io_req_task_queue() >> instead. >> > We should follow &wqe->lock > &ctx->completion_lock from now on, right? > I was thinking getting completion_lock first before:( > > Moreover, there are so many locks and no suggested locking order in > comments, so that it is hard for us to participate in the work. So many locks? There's really not even a handful, and only a few that have overlaps (and hence ordering). Other sub-systems are have a crap ton more :-). But I can document the ordering, at least that's a start even if things like that tend to go stale. Thanks for reporting and testing! I'll queue this up for 5.11. -- Jens Axboe