On Wed, Apr 12, 2023 at 4:23 AM Jens Axboe <axboe@xxxxxxxxx> wrote: > > On 4/11/23 4:48 PM, Kanchan Joshi wrote: > >>> 4. Direct NVMe queues - will there be interest in having io_uring > >>> managed NVMe queues? Sort of a new ring, for which I/O is destaged from > >>> io_uring SQE to NVMe SQE without having to go through intermediate > >>> constructs (i.e., bio/request). Hopefully,that can further amp up the > >>> efficiency of IO. > >> > >> This is interesting, and I've pondered something like that before too. I > >> think it's worth investigating and hacking up a prototype. I recently > >> had one user of IOPOLL assume that setting up a ring with IOPOLL would > >> automatically create a polled queue on the driver side and that is what > >> would be used for IO. And while that's not how it currently works, it > >> definitely does make sense and we could make some things faster like > >> that. It would also potentially easier enable cancelation referenced in > >> #1 above, if it's restricted to the queue(s) that the ring "owns". > > > > So I am looking at prototyping it, exclusively for the polled-io case. > > And for that, is there already a way to ensure that there are no > > concurrent submissions to this ring (set with IORING_SETUP_IOPOLL > > flag)? > > That will be the case generally (and submissions happen under > > uring_lock mutex), but submission may still get punted to io-wq > > worker(s) which do not take that mutex. > > So the original task and worker may get into doing concurrent submissions. > > io-wq may indeed get in your way. But I think for something like this, > you'd never want to punt to io-wq to begin with. If userspace is managing > the queue, then by definition you cannot run out of tags. Unfortunately we have lifetime differences between io_uring and NVMe. NVMe tag remains valid/occupied until completion (we do not have a nice sq->head to look at and decide). For io_uring, it can be reused much earlier i.e. just after submission. So tag shortage is possible. >If there are > other conditions for this kind of request that may run into out-of-memory > conditions, then the error just needs to be returned. I see, and IOSQE_ASYNC can also be flagged as an error/not-supported. Thanks.