On Thu, Feb 24, 2022 at 2:12 AM Sagi Grimberg <sagi@xxxxxxxxxxx> wrote: > > > >>> Actually, I'd rather have something like an 'inverse io_uring', where > >>> an application creates a memory region separated into several 'ring' > >>> for submission and completion. > >>> Then the kernel could write/map the incoming data onto the rings, and > >>> application can read from there. > >>> Maybe it'll be worthwhile to look at virtio here. Another advantage that comes to mind, especially the userspace target needs to operate on the data anyways, is if we're forwarding to io_uring based networking, or user based networking, reading a direct mapping may be quicker than opening a file & reading it. (I think an idea for parallel/out-of-order processing was fd-per-request, if this is too much overhead / too limited due to fd count, perhaps mapping is just the way to go) > >> > >> There is lio loopback backed by tcmu... I'm assuming that nvmet can > >> hook into the same/similar interface. nvmet is pretty lean, and we > >> can probably help tcmu/equivalent scale better if that is a concern... > > > > Sagi, > > > > I looked at tcmu prior to starting this work. Other than the tcmu > > overhead, one concern was the complexity of a scsi device interface > > versus sending block requests to userspace. > > The complexity is understandable, though it can be viewed as a > capability as well. Note I do not have any desire to promote tcmu here, > just trying to understand if we need a brand new interface rather than > making the existing one better. > > > What would be the advantage of doing it as a nvme target over delivering > > directly to userspace as a block driver? > > Well, for starters you gain the features and tools that are extensively > used with nvme. Plus you get the ecosystem support (development, > features, capabilities and testing). There are clear advantages of > plugging into an established ecosystem. I recall when discussing an nvme style approach, another advantage was the nvme target impl could be re-used if exposing the same interface via this user space block device interface, or e.g. presenting as nvme device to a VM, etc. That said, for a device that just needs to support read/write & forward data to some userspace networked storage, the overhead in implementation and interface should be considered. If there's a rich set of tooling here already to create a custom nvme target, perhaps that could be leveraged? Maybe there's a middle ground? If we do a "inverse io_uring" - forwarding the block interface into userspace, and allowing those who choose to implement passthrough commands (to get the extra "capability")? Providing an efficient mechanism to forward block requests to userspace, then allowing the target to implement their favorite flavor. Khazhy
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature