On Tue, Mar 23, 2021 at 12:36:40AM -0700, Sagi Grimberg wrote: >> The process: >> 1.nvme_ns_head_submit_bio call srcu_read_lock(&head->srcu). >> 2.nvme_ns_head_submit_bio will add the bio to current->bio_list instead of >> waiting for the frozen queue. > > Nothing guarantees that you have a bio_list active at any point in time, > in fact for a workload that submits one by one you will always drain > that list directly in the submission... It should always be active when ->submit_bio is called. > >> 3.nvme_ns_head_submit_bio call srcu_read_unlock(&head->srcu, srcu_idx). >> So nvme_ns_head_submit_bio do not hold head->srcu long when the queue is >> frozen, can avoid deadlock. >> >> Sagi, suggest trying this patch. > > The above reproduces with the patch applied on upstream nvme code. Weird. I don't think the deadlock in your original report should happen due to this. Can you take a look at the callstacks in the reproduced deadlock? Either we're missing something obvious or it is a a somewhat different deadlock.