On 2/28/23 02:44, Keith Busch wrote: > On Mon, Feb 27, 2023 at 06:28:41PM +0100, Hannes Reinecke wrote: >> On 2/27/23 17:33, Sagi Grimberg wrote: >>> >>> I'm not up to speed on how CDL is defined, but I'm unclear how CDL at >>> the queue level would cause the host to open more queues? > > Because each CDL class would need its own submission queue in that scheme. They > can all share a single completion queue, so this scheme doesn't necassarily > increase the number of interrupt vectors. > >>> Another question, does CDL have any relationship with NVMe "Time Limited >>> Error Recovery"? where the host can set a feature for timeout and >>> indicate if the controller should respect it per command? >>> >>> While this is not a full-blown every queue/command has its own timeout, >>> it could address the original use-case given by Hannes. And it's already >>> there. >> I guess that is the NVMe version of CDLs; can you give me a reference for >> it? > > They're not the same. TLER starts timing after a command experiences a > recoverable error, where CDL is an end-to-end timing for all commands. Note here that with the current T10/T13 CDL definitions, end-to-end actually means from the time the command is received by the device to the time the device signals the command completion. That does not include the transport & host adapter queueing (if there is an HBA). And I guess this is the issue at hand for fabrics: how to integrate the transport times. I guess the CDL descriptors could have one additional limit for that, but then the duration guideline limit definition would need to be tweaked. -- Damien Le Moal Western Digital Research