Re: [PATCH 5/6] scsi: core: don't limit per-LUN queue depth for SSD when HBA needs

Sumanesh Samanta <sumanesh.samanta@xxxxxxxxxxxx> · Thu, 23 Jan 2020 17:01:47 -0700

Hi Martin,

>> A free host tag does not guarantee that the target

Your point is absolutely correct for a single SSD device, and probably 
for some low-end controllers, but not for high-end HBA that has its own 
queuing mechanism.

The high-end controllers might expose a SCSI interface, but can have all 
kind of devices (NVMe/SCSI/SATA) behind it, and has its own capability 
to queue IO and feed to devices as needed. Those devices should not be 
penalized with the overhead of the device_busy counter, just because 
they chose to expose themselves has SCSI devices (for historical and 
backward compatibility reasons). Rather they should be enabled, so that 
they can compete with devices exposing themselves as NVMe devices.
It is those devices that this patch is meant for, and Ming has provided 
a specific flag for it. For the devices that cannot tolerate more 
outstanding IO, they need not set the flag, and will be unaffected.

In my humble opinion, the SCSI stack should be flexible enough to 
support innovation and not limit some controllers, just because others 
might have limited capability, especially when a whitelist flag is 
provided so that such devices are unaffected.

sincerely,
Sumanesh

device can queue the command.

On 1/20/2020 9:52 PM, Martin K. Petersen wrote:
Ming,

NVMe doesn't have such per-request-queue(namespace) queue depth, so it
is reasonable to ignore the limit for SCSI SSD too.
It is really not. A free host tag does not guarantee that the target
device can queue the command.

The assumption that SSDs are somehow special because they are "fast" is
not valid. Given the common hardware queue depth for a SAS device of
~128 it is often trivial to drive a device into a congestion
scenario. We see it all the time for non-rotational devices, SSDs and
arrays alike. The SSD heuristic is simply not going to fly.

Don't get me wrong, I am very sympathetic to obliterating device_busy in
the hot path. I just don't think it is as easy as just ignoring the
counter and hope for the best. Dynamic queue depth management is an
integral part of the SCSI protocol, not something we can just decide to
bypass because a device claims to be of a certain media type or speed.

I would prefer not to touch drivers that rely on cmd_per_lun / untagged
operation and focus exclusively on the ones that use .track_queue_depth.
For those we could consider an adaptive queue depth management scheme.
Something like not maintaining device_busy until we actually get a
QUEUE_FULL condition. And then rely on the existing queue depth ramp up
heuristics to determine when to disable the busy counter again. Maybe
with an additional watermark or time limit to avoid flip-flopping.

If that approach turns out to work, we should convert all remaining
non-legacy drivers to .track_queue_depth so we only have two driver
queuing flavors to worry about.