On Fri, Feb 10, 2023 at 11:30:33PM +0530, Kanchan Joshi wrote: > is getting more common than it used to be. > NVMe is no longer tied to block storage. Command sets in NVMe 2.0 spec > opened an excellent way to present non-block interfaces to the Host. ZNS > and KV came along with it, and some new command sets are emerging. > > OTOH, Kernel IO advances historically centered around the block IO path. > Passthrough IO path existed, but it stayed far from all the advances, be > it new features or performance. > > Current state & discussion points: > --------------------------------- > Status-quo changed in the recent past with the new passthrough path (ng > char interface + io_uring command). Feature parity does not exist, but > performance parity does. > Adoption draws asks. I propose a session covering a few voices and > finding a path-forward for some ideas too. > > 1. Command cancellation: while NVMe mandatorily supports the abort > command, we do not have a way to trigger that from user-space. There > are ways to go about it (with or without the uring-cancel interface) but > not without certain tradeoffs. It will be good to discuss the choices in > person. > > 2. Cgroups: works for only block dev at the moment. Are there outright > objections to extending this to char-interface IO? But recently the blk-cgroup change towards to associate with disk only, which may become far away from supporting cgroup for pt IO. Another thing is io scheduler, I guess it isn't important for nvme any more? Also IO accounting. > > 3. DMA cost: is high in presence of IOMMU. Keith posted the work[1], > with block IO path, last year. I imagine plumbing to get a bit simpler > with passthrough-only support. But what are the other things that must > be sorted out to have progress on moving DMA cost out of the fast path? > > 4. Direct NVMe queues - will there be interest in having io_uring > managed NVMe queues? Sort of a new ring, for which I/O is destaged from > io_uring SQE to NVMe SQE without having to go through intermediate > constructs (i.e., bio/request). Hopefully,that can further amp up the > efficiency of IO. Interesting! There hasn't bio for nvme io_uring command pt, but request is still here. If SQE can provide unique ID, request may reuse it as tag. Thanks, Ming