From: Melanie Plageman <melanieplageman@xxxxxxxxx> Sent: Monday, March 8, 2021 9:56 AM > > On Mon, Mar 08, 2021 at 02:37:40PM +0000, Michael Kelley wrote: > > From: Melanie Plageman (Microsoft) <melanieplageman@xxxxxxxxx> Sent: Friday, March > 5, 2021 3:22 PM > > > > > > The scsi_device->queue_depth is set to Scsi_Host->cmd_per_lun during > > > allocation. > > > > > > Cap cmd_per_lun at can_queue to avoid dispatch errors. > > > > > > Signed-off-by: Melanie Plageman (Microsoft) <melanieplageman@xxxxxxxxx> > > > --- > > > drivers/scsi/storvsc_drv.c | 2 ++ > > > 1 file changed, 2 insertions(+) > > > > > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c > > > index 6bc5453cea8a..d7953a6e00e6 100644 > > > --- a/drivers/scsi/storvsc_drv.c > > > +++ b/drivers/scsi/storvsc_drv.c > > > @@ -1946,6 +1946,8 @@ static int storvsc_probe(struct hv_device *device, > > > (max_sub_channels + 1) * > > > (100 - ring_avail_percent_lowater) / 100; > > > > > > + scsi_driver.cmd_per_lun = min_t(u32, scsi_driver.cmd_per_lun, > scsi_driver.can_queue); > > > + > > > > I'm not sure what you mean by "avoid dispatch errors". Can you elaborate? > > The scsi_driver.cmd_per_lun is set to 2048. Which is then used to set > Scsi_Host->cmd_per_lun in storvsc_probe(). > > In storvsc_probe(), when doing scsi_scan_host(), scsi_alloc_sdev() is > called and sets the scsi_device->queue_depth to the Scsi_Host's > cmd_per_lun with this code: > > scsi_change_queue_depth(sdev, sdev->host->cmd_per_lun ? > sdev->host->cmd_per_lun : 1); > > During dispatch, the scsi_device->queue_depth is used in > scsi_dev_queue_ready(), called by scsi_mq_get_budget() to determine > whether or not the device can queue another command. > > On some machines, with the 2048 value of cmd_per_lun that was used to > set the initial scsi_device->queue_depth, commands can be queued that > are later not able to be dispatched after running out of space in the > ringbuffer. > > On an 8 core Azure VM with 16GB of memory with a single 1 TiB SSD > (running an fio workload that I can provide if needed), storvsc_do_io() > ends up often returning SCSI_MLQUEUE_DEVICE_BUSY. > > This is the call stack: > > hv_get_bytes_to_write > hv_ringbuffer_write > vmbus_send_packet > storvsc_dio_io > storvsc_queuecommand > scsi_dispatch_cmd > scsi_queue_rq > dispatch_rq_list > > > Be aware that the calculation of "can_queue" in this driver is somewhat > > flawed -- it should not be based on the size of the ring buffer, but instead on > > the maximum number of requests Hyper-V will queue. And even then, > > can_queue doesn't provide the cap you might expect because the blk-mq layer > > allocates can_queue tags for each HW queue, not as a total. > > > The docs for scsi_mid_low_api document Scsi_Host can_queue this way: > > can_queue > - must be greater than 0; do not send more than can_queue > commands to the adapter. > > I did notice that in scsi_host.h, the comment for can_queue does say > can_queue is the "maximum number of simultaneous commands a single hw > queue in HBA will accept." Yes, this comment is correct. The can_queue value is per HW queue. > However, I don't see it being used this way > in the code. > > During dispatch, In scsi_target_queue_ready(), there is this code: > > if (busy >= starget->can_queue) > goto starved; > > And the scsi_target->can_queue value should be coming from Scsi_host as > mentioned in the scsi_target definition in scsi_device.h > /* > * LLDs should set this in the slave_alloc host template callout. > * If set to zero then there is not limit. > */ > unsigned int can_queue; > > So, I don't really see how this would be per hardware queue. For the storvsc driver, the can_queue value in the scsi_target is initialized to zero in scsi_alloc_target() and it remains unchanged. Maybe I'm missing something, but the only place I see that sets starget->can_queue to a non-zero value is iscsi_target_alloc(). The storvsc slave_alloc() function does not set it. So the test in scsi_target_queue_ready() for exceeding can_queue never executes. We've run live tests, and can see that the number of requests sent to the storvsc driver exceeds the can_queue value when the # of HW queues is greater than 1. That result is consistent with what I see in the code. Michael > > > > > I agree that the cmd_per_lun setting is also too big, but we should fix that in > > the context of getting all of these different settings working together correctly, > > and not piecemeal. > > > > Capping Scsi_Host->cmd_per_lun to scsi_driver.can_queue during probe > will also prevent the LUN queue_depth from being set to a value that is > higher than it can ever be set to again by the user when > storvsc_change_queue_depth() is invoked. > > Also in scsi_sysfs sdev_store_queue_depth() there is this check: > > if (depth < 1 || depth > sdev->host->can_queue) > return -EINVAL; > > I would also note that VirtIO SCSI in virtscsi_probe(), Scsi_Host->cmd_per_lun > is set to the min of the configured cmd_per_lun and > Scsi_Host->can_queue: > > shost->cmd_per_lun = min_t(u32, cmd_per_lun, shost->can_queue); > > Best, > Melanie