On 9/8/20 10:15 AM, Ming Lei wrote: > In case of BLK_MQ_F_BLOCKING, blk-mq uses SRCU to mark read critical > section during dispatching request, then request queue quiesce is based on > SRCU. What we want to get is low cost added in fast path. > > With percpu-ref, it is cleaner and simpler & enough for implementing queue > quiesce. The main requirement is to make sure all read sections to observe > QUEUE_FLAG_QUIESCED once blk_mq_quiesce_queue() returns. > > Also it becomes much easier to add interface of async queue quiesce. > > Meantime memory footprint can be reduced with per-request-queue percpu-ref. > > From implementation viewpoint, in fast path, not see percpu_ref is > slower than SRCU, and srcu tree(default option in most distributions) > could be slower since memory barrier is required in both lock & unlock, > and rcu_read_lock()/rcu_read_unlock() should be much cheap than > smp_mb(). > > 1) percpu_ref just hold the rcu_read_lock, then run a check & > increase/decrease on the percpu variable: > > rcu_read_lock() > if (__ref_is_percpu(ref, &percpu_count)) > this_cpu_inc(*percpu_count); > rcu_read_unlock() > > 2) srcu tree: > idx = READ_ONCE(ssp->srcu_idx) & 0x1; > this_cpu_inc(ssp->sda->srcu_lock_count[idx]); > smp_mb(); /* B */ /* Avoid leaking the critical section. */ > > Also from my test on null_blk(blocking), not observe percpu-ref performs > worse than srcu, see the following test: > > 1) test steps: > > rmmod null_blk > /dev/null 2>&1 > modprobe null_blk nr_devices=1 submit_queues=1 blocking=1 > fio --bs=4k --size=512G --rw=randread --norandommap --direct=1 --ioengine=libaio \ > --iodepth=64 --runtime=60 --group_reporting=1 --name=nullb0 \ > --filename=/dev/nullb0 --numjobs=32 > > test machine: HP DL380, 16 cpu cores, 2 threads per core, dual > sockets/numa, Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz > > 2) test result: > - srcu quiesce: 6063K IOPS > - percpu-ref quiesce: 6113K IOPS > > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> > Cc: Sagi Grimberg <sagi@xxxxxxxxxxx> > Cc: Bart Van Assche <bvanassche@xxxxxxx> > Cc: Johannes Thumshirn <Johannes.Thumshirn@xxxxxxx> > Cc: Chao Leng <lengchao@xxxxxxxxxx> > --- > block/blk-mq-sysfs.c | 2 - > block/blk-mq.c | 130 +++++++++++++++++++++-------------------- > block/blk-sysfs.c | 6 +- > include/linux/blk-mq.h | 8 --- > include/linux/blkdev.h | 4 ++ > 5 files changed, 77 insertions(+), 73 deletions(-) > Reviewed-by: Hannes Reinecke <hare@xxxxxxx> Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@xxxxxxx +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), GF: Felix Imendörffer