Re: [PATCH 1/2] blk-mq: introduce blk_mq_complete_request_sync()

Bart Van Assche <bvanassche@xxxxxxx> · Mon, 18 Mar 2019 08:04:55 -0700

On Mon, 2019-03-18 at 15:38 +-0800, Ming Lei wrote:
+AD4 On Sun, Mar 17, 2019 at 09:09:09PM -0700, Bart Van Assche wrote:
+AD4 +AD4 On 3/17/19 8:29 PM, Ming Lei wrote:
+AD4 +AD4 +AD4 In NVMe's error handler, follows the typical steps for tearing down
+AD4 +AD4 +AD4 hardware:
+AD4 +AD4 +AD4 
+AD4 +AD4 +AD4 1) stop blk+AF8-mq hw queues
+AD4 +AD4 +AD4 2) stop the real hw queues
+AD4 +AD4 +AD4 3) cancel in-flight requests via
+AD4 +AD4 +AD4 	blk+AF8-mq+AF8-tagset+AF8-busy+AF8-iter(tags, cancel+AF8-request, ...)
+AD4 +AD4 +AD4 cancel+AF8-request():
+AD4 +AD4 +AD4 	mark the request as abort
+AD4 +AD4 +AD4 	blk+AF8-mq+AF8-complete+AF8-request(req)+ADs
+AD4 +AD4 +AD4 4) destroy real hw queues
+AD4 +AD4 +AD4 
+AD4 +AD4 +AD4 However, there may be race between +ACM-3 and +ACM-4, because blk+AF8-mq+AF8-complete+AF8-request()
+AD4 +AD4 +AD4 actually completes the request asynchronously.
+AD4 +AD4 +AD4 
+AD4 +AD4 +AD4 This patch introduces blk+AF8-mq+AF8-complete+AF8-request+AF8-sync() for fixing the
+AD4 +AD4 +AD4 above race.
+AD4 +AD4 
+AD4 +AD4 Other block drivers wait until outstanding requests have completed by
+AD4 +AD4 calling blk+AF8-cleanup+AF8-queue() before hardware queues are destroyed. Why can't
+AD4 +AD4 the NVMe driver follow that approach?
+AD4 
+AD4 The tearing down of controller can be done in error handler, in which
+AD4 the request queues may not be cleaned up, almost all kinds of NVMe
+AD4 controller's error handling follows the above steps, such as:
+AD4 
+AD4 nvme+AF8-rdma+AF8-error+AF8-recovery+AF8-work()
+AD4 	-+AD4-nvme+AF8-rdma+AF8-teardown+AF8-io+AF8-queues()
+AD4 
+AD4 nvme+AF8-timeout()
+AD4 	-+AD4-nvme+AF8-dev+AF8-disable

Hi Ming,

This makes me wonder whether the current design of the NVMe core is the best
design we can come up with? The structure of e.g. the SRP initiator and target
drivers is similar to the NVMeOF drivers. However, there is no need in the SRP
initiator driver to terminate requests synchronously. Is this due to
differences in the error handling approaches in the SCSI and NVMe core drivers?

Thanks,

Bart.