On 2021/3/23 15:36, Sagi Grimberg wrote:
I check it again. I still think the below patch can avoid the bug.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5a6c35f9af416114588298aa7a90b15bbed15a41
I don't understand what you are saying...
The process:
1.nvme_ns_head_submit_bio call srcu_read_lock(&head->srcu).
2.nvme_ns_head_submit_bio will add the bio to current->bio_list instead of waiting for the frozen queue.
Nothing guarantees that you have a bio_list active at any point in time,
in fact for a workload that submits one by one you will always drain
that list directly in the submission...
submit_bio and nvme_requeue_work both guarantee current->bio_list.
The process:
1.submit_bio and nvme_requeue_work will call submit_bio_noacct.
2.submit_bio_noacct will call __submit_bio_noacct because bio->bi_disk->fops->submit_bio = nvme_ns_head_submit_bio.
3.__submit_bio_noacct set current->bio_list, and then __submit_bio will call bio->bi_disk->fops->submit_bio(nvme_ns_head_submit_bio)
4.nvme_ns_head_submit_bio will add the bio to current->bio_list.
5.__submit_bio_noacct drain current->bio_list.
when drain current->bio_list, it will wait for the frozen queue but do not hold the head->srcu.
Because it call blk_mq_submit_bio directly instead of ->submit_bio(nvme_ns_head_submit_bio).
So it is safe.
3.nvme_ns_head_submit_bio call srcu_read_unlock(&head->srcu, srcu_idx).
So nvme_ns_head_submit_bio do not hold head->srcu long when the queue is frozen, can avoid deadlock.
Sagi, suggest trying this patch.
The above reproduces with the patch applied on upstream nvme code.The new patch(blk_mq_submit_bio_direct) will cause the bug again.
Because it revert add the bio to current->bio_list.
Just try the upstream nvme code, and do not apply the new patch(blk_mq_submit_bio_direct).
.