Re: [LSF/MM/BPF BOF] Userspace command abouts

Damien Le Moal <damien.lemoal@xxxxxxxxxxxxxxxxxx> · Tue, 28 Feb 2023 06:42:00 +0900

On 2/28/23 02:44, Keith Busch wrote:
> On Mon, Feb 27, 2023 at 06:28:41PM +0100, Hannes Reinecke wrote:
>> On 2/27/23 17:33, Sagi Grimberg wrote:
>>>
>>> I'm not up to speed on how CDL is defined, but I'm unclear how CDL at
>>> the queue level would cause the host to open more queues?
> 
> Because each CDL class would need its own submission queue in that scheme. They
> can all share a single completion queue, so this scheme doesn't necassarily
> increase the number of interrupt vectors.
> 
>>> Another question, does CDL have any relationship with NVMe "Time Limited
>>> Error Recovery"? where the host can set a feature for timeout and
>>> indicate if the controller should respect it per command?
>>>
>>> While this is not a full-blown every queue/command has its own timeout,
>>> it could address the original use-case given by Hannes. And it's already
>>> there.
>> I guess that is the NVMe version of CDLs; can you give me a reference for
>> it?
> 
> They're not the same. TLER starts timing after a command experiences a
> recoverable error, where CDL is an end-to-end timing for all commands.

Note here that with the current T10/T13 CDL definitions, end-to-end actually
means from the time the command is received by the device to the time the device
signals the command completion.

That does not include the transport & host adapter queueing (if there is an
HBA). And I guess this is the issue at hand for fabrics: how to integrate the
transport times. I guess the CDL descriptors could have one additional limit for
that, but then the duration guideline limit definition would need to be tweaked.

-- 
Damien Le Moal
Western Digital Research