Re: [LSF/MM/BPF BOF] Userspace command abouts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/24/2023 8:15 PM, Damien Le Moal wrote:
On 2/25/23 10:51, Keith Busch wrote:
On Fri, Feb 24, 2023 at 11:54:39PM +0000, Chaitanya Kulkarni wrote:
I do think that we should work on CDL for NVMe as it will solve some of
the timeout related problems effectively than using aborts or any other
mechanism.

That proposal exists in NVMe TWG, but doesn't appear to have recent activity.
The last I heard, one point of contention was where the duration limit property
exists: within the command, or the queue. From my perspective, if it's not at
the queue level, the limit becomes meaningless, but hey, it's not up to me.

Limit attached to the command makes things more flexible and easier for the
host, so personally, I prefer that. But this has an impact on the controller:
the device needs to pull in *all* commands to be able to know the limits and do
scheduling/aborts appropriately. That is not something that the device designers
like, for obvious reasons (device internal resources...).

On the other hand, limits attached to queues could lead to either a serious
increase in the number of queues (PCI space & number of IRQ vectors limits), or,
loss of performance as a particular queue with the desired limit would be
accessed from multiple CPUs on the host (lock contention). Tricky problem I
think with lots of compromises.


From a fabrics perspective:

- at the command: is workable. However, the times are distorted as it won't include fabric transmission time of the cmd or rsp, nor any retransission of cmd xmt or rsp xmt under the fabric protecting against loss.

- at the queue: is not workable. It effectively becomes a host transport timer as the cdl has to cover all fabric transmission times and the only entity that can time/enforce the timer is the host transport. Also, what does the host transport do when the timer expires ? there are only a couple of things it can do, all of them disruptive and at best delaying the response back to the caller.

- CDL can only be meaningful (ie completion times close to cdl) in the absence of transport errors. Cmd termination, perhaps tied with connection loss/failure detection as well as connection/queue termination or or association termination - can have timers that are well above the CDL value. Any cmd completion guarantee within time-X can become meaningless.

-- james





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux