On 10/25/19 12:58 AM, Ming Lei wrote: > It isn't necessary to check the host depth in scsi_queue_rq() any more > since it has been respected by blk-mq before calling scsi_queue_rq() via > getting driver tag. > > Lots of LUNs may attach to same host and per-host IOPS may reach millions, > so we should avoid expensive atomic operations on the host-wide counter in > the IO path. > > This patch implements scsi_host_busy() via blk_mq_tagset_busy_iter() > with one scsi command state for reading the count of busy IOs for scsi_mq. > > It is observed that IOPS is increased by 15% in IO test on scsi_debug (32 > LUNs, 32 submit queues, 1024 can_queue, libaio/dio) in a dual-socket > system. > > V5: > - fix document on .can_queue, no code change > > V4: > - fix one build waring, just a line change in scsi_dev_queue_ready() > > V3: > - use non-atomic set/clear bit operations as suggested by Bart > - kill single field struct for storing count of in-flight requests > - add patch to bypass the atomic LUN-wide counter of device_busy > for fast SSD device > > V2: > - introduce SCMD_STATE_INFLIGHT for getting accurate host busy > via blk_mq_tagset_busy_iter() > - verified that original Jens's report[1] is fixed > - verified that SCSI timeout/abort works fine > > [1] https://www.spinics.net/lists/linux-scsi/msg122867.html > [2] V1 & its revert: Reviewed-by: Jens Axboe <axboe@xxxxxxxxx> -- Jens Axboe