On 6/9/21 12:50 PM, Kanchan Joshi wrote: > Background & objectives: > ------------------------ > > The NVMe passthrough interface > > Good part: allows new device-features to be usable (at least in raw > form) without having to build block-generic cmds, in-kernel users, > emulations and file-generic user-interfaces - all this take some time to > evolve. > > Bad part: passthrough interface has remain tied to synchronous ioctl, > which is a blocker for performance-centric usage scenarios. User-space > can take the pain of implementing async-over-sync on its own but it does > not make much sense in a world that already has io_uring. > > Passthrough is lean in the sense it cuts through layers of abstractions > and reaches to NVMe fast. One of the objective here is to build a > scalable pass-through that can be readily used to play with new/emerging > NVMe features. Another is to surpass/match existing raw/direct block > I/O performance with this new in-kernel path. > > Recent developments: > -------------------- > - NVMe now has a per-namespace char interface that remains available/usable > even for unsupported features and for new command-sets [1]. > > - Jens has proposed async-ioctl like facility 'uring_cmd' in io_uring. This > introduces new possibilities (beyond storage); async-passthrough is one of > those. Last posted version is V4 [2]. > > - I have posted work on async nvme passthrough over block-dev [3]. Posted work > is in V4 (in sync with the infra of [2]). > > Early performance numbers: > -------------------------- > fio, randread, 4k bs, 1 job > Kiops, with varying QD: > > QD Sync-PT io_uring Async-PT > 1 10.8 10.6 10.6 > 2 10.9 24.5 24 > 4 10.6 45 46 > 8 10.9 90 89 > 16 11.0 169 170 > 32 10.6 308 307 > 64 10.8 503 506 > 128 10.9 592 596 > > Further steps/discussion points: > -------------------------------- > 1.Async-passthrough over nvme char-dev > It is in a shape to receive feedback, but I am not sure if community > would like to take a look at that before settling on uring-cmd infra. > > 2.Once above gets in shape, bring other perf-centric features of io_uring to > this path - > A. SQPoll and register-file: already functional. > B. Passthrough polling: This can be enabled for block and looks feasible for > char-interface as well. Keith recently posted enabling polling for user > pass-through [4] > C. Pre-mapped buffers: Early thought is to let the buffers registered by > io_uring, and add a new passthrough ioctl/uring_cmd in driver which does > everything that passthrough does except pinning/unpinning the pages. > > 3. Are there more things in the "io_uring->nvme->[block-layer]->nvme" path > which can be optimized. > > Ideally I'd like to cover good deal of ground before Dec. But there seems > plenty of possibilities on this path. Discussion would help in how best to > move forward, and cement the ideas. > > [1] https://lore.kernel.org/linux-nvme/20210421074504.57750-1-minwoo.im.dev@xxxxxxxxx/ > [2] https://lore.kernel.org/linux-nvme/20210317221027.366780-1-axboe@xxxxxxxxx/ > [3] https://lore.kernel.org/linux-nvme/20210325170540.59619-1-joshi.k@xxxxxxxxxxx/ > [4] https://lore.kernel.org/linux-block/20210517171443.GB2709391@xxxxxxxxxxxxxxxxxxxxxxxxxxx/#t > I do like the idea. What I would like to see is to make the ioring_cmd infrastructure generally available, such that we can port the SCSI sg asynchronous interface over to this. Doug Gilbert has been fighting a lone battle to improve the sg asynchronous interface, as the current one is deemed a security hazard. But in the absence of a generic interface he had to design his own ioctls, with all the expected pushback. Plus there are only so many people who care about sg internals :-( Being able to use ioring_cmd would be a neat way out of this. Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@xxxxxxx +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), GF: Felix Imendörffer