On Thu, Jun 24, 2021 at 11:24:27AM +0200, Hannes Reinecke wrote: > On 6/9/21 12:50 PM, Kanchan Joshi wrote: > > Background & objectives: > > ------------------------ > > > > The NVMe passthrough interface > > > > Good part: allows new device-features to be usable (at least in raw > > form) without having to build block-generic cmds, in-kernel users, > > emulations and file-generic user-interfaces - all this take some time to > > evolve. > > > > Bad part: passthrough interface has remain tied to synchronous ioctl, > > which is a blocker for performance-centric usage scenarios. User-space > > can take the pain of implementing async-over-sync on its own but it does > > not make much sense in a world that already has io_uring. > > > > Passthrough is lean in the sense it cuts through layers of abstractions > > and reaches to NVMe fast. One of the objective here is to build a > > scalable pass-through that can be readily used to play with new/emerging > > NVMe features. Another is to surpass/match existing raw/direct block > > I/O performance with this new in-kernel path. > > > > Recent developments: > > -------------------- > > - NVMe now has a per-namespace char interface that remains available/usable > > even for unsupported features and for new command-sets [1]. > > > > - Jens has proposed async-ioctl like facility 'uring_cmd' in io_uring. This > > introduces new possibilities (beyond storage); async-passthrough is one of > > those. Last posted version is V4 [2]. > > > > - I have posted work on async nvme passthrough over block-dev [3]. Posted work > > is in V4 (in sync with the infra of [2]). > > > > Early performance numbers: > > -------------------------- > > fio, randread, 4k bs, 1 job > > Kiops, with varying QD: > > > > QD Sync-PT io_uring Async-PT > > 1 10.8 10.6 10.6 > > 2 10.9 24.5 24 > > 4 10.6 45 46 > > 8 10.9 90 89 > > 16 11.0 169 170 > > 32 10.6 308 307 > > 64 10.8 503 506 > > 128 10.9 592 596 > > > > Further steps/discussion points: > > -------------------------------- > > 1.Async-passthrough over nvme char-dev > > It is in a shape to receive feedback, but I am not sure if community > > would like to take a look at that before settling on uring-cmd infra. > > > > 2.Once above gets in shape, bring other perf-centric features of io_uring to > > this path - > > A. SQPoll and register-file: already functional. > > B. Passthrough polling: This can be enabled for block and looks feasible for > > char-interface as well. Keith recently posted enabling polling for user > > pass-through [4] > > C. Pre-mapped buffers: Early thought is to let the buffers registered by > > io_uring, and add a new passthrough ioctl/uring_cmd in driver which does > > everything that passthrough does except pinning/unpinning the pages. > > > > 3. Are there more things in the "io_uring->nvme->[block-layer]->nvme" path > > which can be optimized. > > > > Ideally I'd like to cover good deal of ground before Dec. But there seems > > plenty of possibilities on this path. Discussion would help in how best to > > move forward, and cement the ideas. > > > > [1] https://lore.kernel.org/linux-nvme/20210421074504.57750-1-minwoo.im.dev@xxxxxxxxx/ > > [2] https://lore.kernel.org/linux-nvme/20210317221027.366780-1-axboe@xxxxxxxxx/ > > [3] https://lore.kernel.org/linux-nvme/20210325170540.59619-1-joshi.k@xxxxxxxxxxx/ > > [4] https://lore.kernel.org/linux-block/20210517171443.GB2709391@xxxxxxxxxxxxxxxxxxxxxxxxxxx/#t > > > I do like the idea. > > What I would like to see is to make the ioring_cmd infrastructure > generally available, such that we can port the SCSI sg asynchronous > interface over to this. What prevents you from doing this already? I think we just need more patch reviews for the generic io-uring cmd patches, no? Luis