Re: memory access op ideas

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/24/2022 5:45 PM, Jens Axboe wrote:
On 4/24/22 8:56 AM, Avi Kivity wrote:

On 24/04/2022 16.30, Jens Axboe wrote:
On 4/24/22 7:04 AM, Avi Kivity wrote:
On 23/04/2022 20.30, Jens Axboe wrote:
On 4/23/22 10:23 AM, Avi Kivity wrote:
Perhaps the interface should be kept separate from io_uring. e.g. use
a pidfd to represent the address space, and then issue
IORING_OP_PREADV/IORING_OP_PWRITEV to initiate dma. Then one can copy
across process boundaries.
Then you just made it a ton less efficient, particularly if you used the
vectored read/write. For this to make sense, I think it has to be a
separate op. At least that's the only implementation I'd be willing to
entertain for the immediate copy.

Sorry, I caused a lot of confusion by bundling immediate copy and a
DMA engine interface. For sure the immediate copy should be a direct
implementation like you posted!

User-to-user copies are another matter. I feel like that should be a
stand-alone driver, and that io_uring should be an io_uring-y way to
access it. Just like io_uring isn't an NVMe driver.
Not sure I understand your logic here or the io_uring vs nvme driver
reference, to be honest. io_uring _is_ a standalone way to access it,
you can use it sync or async through that.

If you're talking about a standalone op vs being useful from a command
itself, I do think both have merit and I can see good use cases for
both.

I'm actually not so certain the model where io_uring has special operations for driving DMA engines works out. I think in all cases you can accomplish what you want by reading or writing to existing file constructs, and just having those transparently offload to a DMA engine if one is available on your behalf.

As a concrete example, let's take an inter-process copy. The main challenges with this one are the security model (who's allowed to copy where?) and synchronization between the two applications (when did the data change?).

Rather, I'd consider implementing the inter-process copy using an existing mechanism like a Unix domain socket. The sender maybe does a MSG_ZEROCOPY send via io_uring, and the receiver does an async recv, and the kernel can use a DMA engine to move the data directly between the two buffers if it has one avaiable. Then you get the existing security model and coordination, and software works whether there's a DMA engine available or not.

It's a similar story for copying to memory on a PCI device. You'd need some security model to decide you're allowed to copy there, which is probably best expressed by opening a file that represents that BAR and then doing reads/writes to it.

This is at least the direction I've been pursuing. The DMA engine channel is associated with the io_uring and the kernel just intelligently offloads whatever it can.



I'm saying that if dma is exposed to userspace, it should have a
regular synchronous interface (maybe open("/dev/dma"), maybe something
else). io_uring adds asynchrony to everything, but it's not
everything's driver.

Sure, my point is that if/when someone wants to add that, they should be
free to do so. It's not a fair requirement to put on someone doing the
initial work on wiring this up. It may not be something they would want
to use to begin with, and it's perfectly easy to run io_uring in sync
mode should you wish to do so. The hard part is making the
issue+complete separate actions, rolling a sync API on top of that would
be trivial.

Just FYI but the Intel idxd driver already has a userspace interface that's async/poll-mode. Commands are submitted to a mmap'd portal using the movdir64/enqcmd instructions directly. It does not expose an fd you can read/write to in order to trigger copies, so it is not compatible with io_uring, but it doesn't really need to be since it is already async.

What isn't currently exposed to userspace is access to the "dmaengine" framework. Prior to the patchset I have pending that I linked earlier in the thread, the "dmaengine" framework couldn't really operate in async/poll mode and handle out-of-order processing, etc. But after that series maybe.


Anyway maybe we drifted off somewhere and this should be decided by
pragmatic concerns (like whatever the author of the driver prefers).

Indeed!





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux