On Mon, Mar 18, 2019 at 08:04:55AM -0700, Bart Van Assche wrote: > On Mon, 2019-03-18 at 15:38 +0800, Ming Lei wrote: > > On Sun, Mar 17, 2019 at 09:09:09PM -0700, Bart Van Assche wrote: > > > On 3/17/19 8:29 PM, Ming Lei wrote: > > > > In NVMe's error handler, follows the typical steps for tearing down > > > > hardware: > > > > > > > > 1) stop blk_mq hw queues > > > > 2) stop the real hw queues > > > > 3) cancel in-flight requests via > > > > blk_mq_tagset_busy_iter(tags, cancel_request, ...) > > > > cancel_request(): > > > > mark the request as abort > > > > blk_mq_complete_request(req); > > > > 4) destroy real hw queues > > > > > > > > However, there may be race between #3 and #4, because blk_mq_complete_request() > > > > actually completes the request asynchronously. > > > > > > > > This patch introduces blk_mq_complete_request_sync() for fixing the > > > > above race. > > > > > > Other block drivers wait until outstanding requests have completed by > > > calling blk_cleanup_queue() before hardware queues are destroyed. Why can't > > > the NVMe driver follow that approach? > > > > The tearing down of controller can be done in error handler, in which > > the request queues may not be cleaned up, almost all kinds of NVMe > > controller's error handling follows the above steps, such as: > > > > nvme_rdma_error_recovery_work() > > ->nvme_rdma_teardown_io_queues() > > > > nvme_timeout() > > ->nvme_dev_disable > > Hi Ming, > > This makes me wonder whether the current design of the NVMe core is the best > design we can come up with? The structure of e.g. the SRP initiator and target > drivers is similar to the NVMeOF drivers. However, there is no need in the SRP > initiator driver to terminate requests synchronously. Is this due to I am not familiar with SRP, could you explain what SRP initiator driver will do when the controller is in bad state? Especially about dealing with in-flight IO requests under this situation. > differences in the error handling approaches in the SCSI and NVMe core drivers? As far as I can tell, I don't see obvious design issue in NVMe host drivers, which tries best to recover controller and retries to complete all in-flight IO. Thanks, Ming