@@ -791,7 +791,8 @@ static void nvme_rdma_error_recovery_work(struct
work_struct *work)
* queues are not a live anymore, so restart the queues to fail
fast
* new IO
*/
- blk_mq_start_stopped_hw_queues(ctrl->ctrl.admin_q, true);
+ blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
+ blk_mq_kick_requeue_list(ctrl->ctrl.admin_q);
Now the queue won't be stopped via blk_mq_quiesce_queue(), so why do
you add blk_mq_kick_requeue_list() here?
I think you're right.
We now quiesce the queue and fast fail inflight io, in
nvme_complete_rq we call blk_mq_requeue_request with
!blk_mq_queue_stopped(req->q) which is now true.
So the requeue_work is triggered and requeue the request,
and when we unquiesce we simply run the hw queues again.
If we were to call it with !blk_queue_quiesced(req->q)
I think it would be needed though...
If you look at nvme_start_queues, it also kicks the requeue
work. I think that the proper fix for this is _keep_ the
requeue kick and in nvme_complete_rq call:
blk_mq_requeue_request(req, !blk_queue_quiesced(req->q));
Thoughts?