I check it again. I still think the below patch can avoid the bug.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5a6c35f9af416114588298aa7a90b15bbed15a41
I don't understand what you are saying...
The process:
1.nvme_ns_head_submit_bio call srcu_read_lock(&head->srcu).
2.nvme_ns_head_submit_bio will add the bio to current->bio_list instead
of waiting for the frozen queue.
Nothing guarantees that you have a bio_list active at any point in time,
in fact for a workload that submits one by one you will always drain
that list directly in the submission...
3.nvme_ns_head_submit_bio call srcu_read_unlock(&head->srcu, srcu_idx).
So nvme_ns_head_submit_bio do not hold head->srcu long when the queue is
frozen, can avoid deadlock.
Sagi, suggest trying this patch.
The above reproduces with the patch applied on upstream nvme code.