About nvme_stop_queues need long times for large number namespaces, If work with multipath and one path fails, It cause wait long times to fail over to retry, and the more namespaces the longer the time. This has a great impact on delay-sensitive services. there are two options to fix it: 1. Use percpu instead of SRCU. Ming's patchset. 2. Use tagset quiesce interface with SRCU. Sagi's patchset. The two patchsets are still pending. It is a serious bug, I expect that we can revisit the solution. Maybe we don't have the best option, but we need to choose a relatively acceptable option. Can we fix the bug for non-blocking queues(which used by fc&rdma) first? Sagi & Ming, what do you think? Thank you. On 2020/10/20 16:55, Ming Lei wrote:
Hi Jens, The 1st patch add .mq_quiesce_mutex for serializing quiesce/unquiesce, and prepares for replacing srcu with percpu_ref. The 2nd patch replaces srcu with percpu_ref. The 3rd patch adds tagset quiesce interface. The 4th patch applies tagset quiesce interface for NVMe subsystem. V8: - rebase on latest linus tree, only there is small fuzz change on 2/4 V7: - base on latest for-5.10/block, only there is small change on 2/4 V6: - base on for-5.10/block directly, instead of being against on patchset of 'percpu_ref & block: reduce memory footprint of percpu_ref in fast path', because these patches don't depend on that patchset. V5: - warn once in case that driver unquiesces its queue being quiesce and not done, only patch 2 is modified V4: - remove .mq_quiesce_mutex, and switch to test_and_[set|clear] for avoiding duplicated quiesce action - pass blktests(block, nvme) V3: - add tagset quiesce interface - apply tagset quiesce interface for NVMe - pass blktests(block, nvme) V2: - add .mq_quiesce_lock - add comment on patch 2 wrt. handling hctx_lock() failure - trivial patch style change Ming Lei (3): block: use test_and_{clear|test}_bit to set/clear QUEUE_FLAG_QUIESCED blk-mq: implement queue quiesce via percpu_ref for BLK_MQ_F_BLOCKING blk-mq: add tagset quiesce interface Sagi Grimberg (1): nvme: use blk_mq_[un]quiesce_tagset block/blk-core.c | 13 +++ block/blk-mq-sysfs.c | 2 - block/blk-mq.c | 182 +++++++++++++++++++++++++-------------- block/blk-sysfs.c | 6 +- block/blk.h | 2 + drivers/nvme/host/core.c | 19 ++-- include/linux/blk-mq.h | 10 +-- include/linux/blkdev.h | 4 + 8 files changed, 154 insertions(+), 84 deletions(-) Cc: Hannes Reinecke <hare@xxxxxxx> Cc: Sagi Grimberg <sagi@xxxxxxxxxxx> Cc: Bart Van Assche <bvanassche@xxxxxxx> Cc: Johannes Thumshirn <Johannes.Thumshirn@xxxxxxx> Cc: Chao Leng <lengchao@xxxxxxxxxx>