Hi Ming, I have tested this patch extensively in our labs. This patch gives excellent results when a single device can provide very high IOPs, and only a few of those devices are available on the system. Thus, if a RAID 0 volume is created out of many high end NVMe devices, then that RAID0 volume can potentially reach a max IOPs that is a summation of the maxs IOPS for all the underlying drives. Without this patch, the current kernel code cannot get there. For example, for a simple RAID0 volume with 32 NVMe drives, I got almost 100% performance boost with this patch. The NVMe stack does not have this limitation, and this patch goes a long way in closing that gap. I have also tested it in many other configurations, and did not see any adverse side effects. Please feel free to add: Tested-by: Sumanesh Samanta Thanks, Sumanesh On Tue, Sep 22, 2020 at 7:33 PM Ming Lei <ming.lei@xxxxxxxxxx> wrote: > > Hi, > > scsi uses one global atomic variable to track queue depth for each > LUN/request queue. This way can't scale well when there is lots of CPU > cores and the disk is very fast. Broadcom guys has complained that their > high end HBA can't reach top performance because .device_busy is > operated in IO path. > > Replace the atomic variable sdev->device_busy with sbitmap for > tracking scsi device queue depth. > > Test on scsi_debug shows this way improve IOPS > 20%. Meantime > the IOPS difference is just ~1% compared with bypassing .device_busy > on scsi_debug via patches[1] > > The 1st 6 patches moves percpu allocation hint into sbitmap, since > the improvement by doing percpu allocation hint on sbitmap is observable. > Meantime export helpers for SCSI. > > Patch 7 and 8 prepares for the conversion by returning budget token > from .get_budget callback, meantime passes the budget token to driver > via 'struct blk_mq_queue_data' in .queue_rq(). > > The last four patches changes SCSI for switching to track device queue > depth via sbitmap. > > The patchset have been tested by Broadcom, and obvious performance boost > can be observed. > > Given it is based on both for-5.10/block and 5.10/scsi-queue, the target > is for v5.11. And it is posted out just for getting full/enough review. > > Please comment and review! > > V3: > - rebase on both for-5.10/block and 5.10/scsi-queue. > > V2: > - fix one build failure > > > Ming Lei (12): > sbitmap: remove sbitmap_clear_bit_unlock > sbitmap: maintain allocation round_robin in sbitmap > sbitmap: add helpers for updating allocation hint > sbitmap: move allocation hint into sbitmap > sbitmap: export sbitmap_weight > sbitmap: add helper of sbitmap_calculate_shift > blk-mq: add callbacks for storing & retrieving budget token > blk-mq: return budget token from .get_budget callback > scsi: put hot fields of scsi_host_template into one cacheline > scsi: add scsi_device_busy() to read sdev->device_busy > scsi: make sure sdev->queue_depth is <= shost->can_queue > scsi: replace sdev->device_busy with sbitmap > > block/blk-mq-sched.c | 17 ++- > block/blk-mq.c | 38 +++-- > block/blk-mq.h | 25 +++- > block/kyber-iosched.c | 3 +- > drivers/message/fusion/mptsas.c | 2 +- > drivers/scsi/mpt3sas/mpt3sas_scsih.c | 2 +- > drivers/scsi/scsi.c | 4 + > drivers/scsi/scsi_lib.c | 69 ++++++--- > drivers/scsi/scsi_priv.h | 1 + > drivers/scsi/scsi_scan.c | 22 ++- > drivers/scsi/scsi_sysfs.c | 4 +- > drivers/scsi/sg.c | 2 +- > include/linux/blk-mq.h | 13 +- > include/linux/sbitmap.h | 84 +++++++---- > include/scsi/scsi_cmnd.h | 2 + > include/scsi/scsi_device.h | 8 +- > include/scsi/scsi_host.h | 72 ++++----- > lib/sbitmap.c | 213 +++++++++++++++------------ > 18 files changed, 376 insertions(+), 205 deletions(-) > > Cc: Omar Sandoval <osandov@xxxxxx> > Cc: Kashyap Desai <kashyap.desai@xxxxxxxxxxxx> > Cc: Sumanesh Samanta <sumanesh.samanta@xxxxxxxxxxxx> > Cc: Ewan D. Milne <emilne@xxxxxxxxxx> > Cc: Hannes Reinecke <hare@xxxxxxx> > > -- > 2.25.2 >
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature