Actually, I'd rather have something like an 'inverse io_uring', where an
application creates a memory region separated into several 'ring' for
submission and completion.
Then the kernel could write/map the incoming data onto the rings, and
application can read from there.
Maybe it'll be worthwhile to look at virtio here.
There is lio loopback backed by tcmu... I'm assuming that nvmet can
hook into the same/similar interface. nvmet is pretty lean, and we
can probably help tcmu/equivalent scale better if that is a concern...