Re: [LSF/MM/BPF ATTEND][LSF/MM/BPF Topic] Non-block IO

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2/10/23 21:53, Jens Axboe wrote:
On 2/10/23 11:00?AM, Kanchan Joshi wrote:
is getting more common than it used to be.
NVMe is no longer tied to block storage. Command sets in NVMe 2.0 spec
opened an excellent way to present non-block interfaces to the Host. ZNS
and KV came along with it, and some new command sets are emerging.

OTOH, Kernel IO advances historically centered around the block IO path.
Passthrough IO path existed, but it stayed far from all the advances, be
it new features or performance.

Current state & discussion points:
---------------------------------
Status-quo changed in the recent past with the new passthrough path (ng
char interface + io_uring command). Feature parity does not exist, but
performance parity does.
Adoption draws asks. I propose a session covering a few voices and
finding a path-forward for some ideas too.

1. Command cancellation: while NVMe mandatorily supports the abort
command, we do not have a way to trigger that from user-space. There
are ways to go about it (with or without the uring-cancel interface) but
not without certain tradeoffs. It will be good to discuss the choices in
person.

This would require some rework of how the driver handles aborts today.
I'm unsure what the cancellation guarantees that io_uring provides, but
need to understand if it fits with the guarantees that nvme provides.

It is also unclear to me how this would work if different namespaces
are handed to different users, and have them all submit aborts on
the admin queue. How do you even differentiate which user sent which
command?


2. Cgroups: works for only block dev at the moment. Are there outright
objections to extending this to char-interface IO?

3. DMA cost: is high in presence of IOMMU. Keith posted the work[1],
with block IO path, last year. I imagine plumbing to get a bit simpler
with passthrough-only support. But what are the other things that must
be sorted out to have progress on moving DMA cost out of the fast path?

Yeah, this one is still pending... Would be nice to make some progress
there at some point.

4. Direct NVMe queues - will there be interest in having io_uring
managed NVMe queues?  Sort of a new ring, for which I/O is destaged from
io_uring SQE to NVMe SQE without having to go through intermediate
constructs (i.e., bio/request). Hopefully,that can further amp up the
efficiency of IO.

This is interesting, and I've pondered something like that before too. I
think it's worth investigating and hacking up a prototype. I recently
had one user of IOPOLL assume that setting up a ring with IOPOLL would
automatically create a polled queue on the driver side and that is what
would be used for IO. And while that's not how it currently works, it
definitely does make sense and we could make some things faster like
that.

I also think it can makes sense, I'd use it if it was available.
Though io_uring may need to abstract the fact that the device may be
limited on the number of queues it supports, also this would need to be
an interface needed from the driver that would need to understand how to
coordinate controller reset/teardown in the presence of "alien" queues.

 It would also potentially easier enable cancelation referenced in
#1 above, if it's restricted to the queue(s) that the ring "owns".


That could be a potential enforcement, correlating the command with
the dedicated queue. Still feels dangerous because if admin abort(s)
time out the driver really needs to reset the entire controller...
So it is not really "isolated" when it comes to aborts/cancellations.



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux