On 2/18/2020 9:41 AM, Keith Busch wrote:
On Tue, Feb 18, 2020 at 10:54:54AM -0500, Tim Walker wrote:
With regards to our discussion on queue depths, it's common knowledge
that an HDD choses commands from its internal command queue to
optimize performance. The HDD looks at things like the current
actuator position, current media rotational position, power
constraints, command age, etc to choose the best next command to
service. A large number of commands in the queue gives the HDD a
better selection of commands from which to choose to maximize
throughput/IOPS/etc but at the expense of the added latency due to
commands sitting in the queue.
NVMe doesn't allow us to pull commands randomly from the SQ, so the
HDD should attempt to fill its internal queue from the various SQs,
according to the SQ servicing policy, so it can have a large number of
commands to choose from for its internal command processing
optimization.
You don't need multiple queues for that. While the device has to fifo
fetch commands from a host's submission queue, it may reorder their
executuion and completion however it wants, which you can do with a
single queue.
It seems to me that the host would want to limit the total number of
outstanding commands to an NVMe HDD
The host shouldn't have to decide on limits. NVMe lets the device report
it's queue count and depth. It should the device's responsibility to
report appropriate values that maximize iops within your latency limits,
and the host will react accordingly.
+1 on Keith's comments. Also, if a ns depth limit needs to be
introduced, it should be via the nvme committee and then reported back
as device attributes. Many of SCSI's problems where the protocol didn't
solve it, especially in multi-initiator environments, which made all
kinds of requirements/mish-mashes on host stacks and target behaviors.
none of that should be repeated.
-- james