On 22/04/2022 17.50, Jens Axboe wrote:
On 4/13/22 4:33 AM, Avi Kivity wrote:
Unfortunately, only ideas, no patches. But at least the first seems very easy.
- IORING_OP_MEMCPY_IMMEDIATE - copy some payload included in the op
itself (1-8 bytes) to a user memory location specified by the op.
Linked to another op, this can generate an in-memory notification
useful for busy-waiters or the UMWAIT instruction
This would be useful for Seastar, which looks at a timer-managed
memory location to check when to break computation loops.
This one would indeed be trivial to do. If we limit the max size
supported to eg 8 bytes like suggested, then it could be in the sqe
itself and just copied to the user address specified.
Eg have sqe->len be the length (1..8 bytes), sqe->addr the destination
address, and sqe->off the data to copy.
If you'll commit to testing this, I can hack it up pretty quickly...
I can certainly commit to test it in a VM (my workstation has a
hate-hate relationship with custom kernels).
- IORING_OP_MEMCPY - asynchronously copy memory
Some CPUs include a DMA engine, and io_uring is a perfect interface to
exercise it. It may be difficult to find space for two iovecs though.
I've considered this one in the past too, and it is indeed an ideal fit
in terms of API. Outside of the DMA engines, it can also be used to
offload memcpy to a GPU, for example.
The io_uring side would not be hard to wire up, basically just have the
sqe specfy source, destination, length. Add some well defined flags
depending on what the copy engine offers, for example.
But probably some work required here in exposing an API and testing
etc...
Perhaps the interface should be kept separate from io_uring. e.g. use a
pidfd to represent the address space, and then issue
IORING_OP_PREADV/IORING_OP_PWRITEV to initiate dma. Then one can copy
across process boundaries.
A different angle is to use expose the dma device as a separate fd. This
can be useful as dma engine can often do other operations, like xor or
crc or encryption or compression. In any case I'd argue for the
interface to be useful outside io_uring, although that considerably
increases the scope. I also don't have a direct use case for it, though
I'm sure others will.
The kernel itself should find the DMA engine useful for things like
memory compaction.