On 3/17/2019 9:09 PM, Bart Van Assche wrote:
On 3/17/19 8:29 PM, Ming Lei wrote:
In NVMe's error handler, follows the typical steps for tearing down
hardware:
1) stop blk_mq hw queues
2) stop the real hw queues
3) cancel in-flight requests via
blk_mq_tagset_busy_iter(tags, cancel_request, ...)
cancel_request():
mark the request as abort
blk_mq_complete_request(req);
4) destroy real hw queues
However, there may be race between #3 and #4, because
blk_mq_complete_request()
actually completes the request asynchronously.
This patch introduces blk_mq_complete_request_sync() for fixing the
above race.
Other block drivers wait until outstanding requests have completed by
calling blk_cleanup_queue() before hardware queues are destroyed. Why
can't the NVMe driver follow that approach?
speaking for the fabrics, not necessarily pci:
The intent of this looping, which happens immediately following an error
being detected, is to cause the termination of the outstanding requests.
Otherwise, the only recourse is to wait for the ios to finish, which
they may never do, or have their upper-level timeout expire to cause
their termination - thus a very long delay. And one of the commands,
on the admin queue - a different tag set but handled the same, doesn't
have a timeout (the Async Event Reporting command) so it wouldn't
necessarily clear without this looping.
-- james