On Tue, Mar 29, 2022 at 06:16:56PM +0100, Paul Cercueil wrote: > Hi Daniel, > > Le mar., mars 29 2022 at 16:10:44 +0200, Daniel Vetter <daniel@xxxxxxxx> a > écrit : > > On Tue, Mar 29, 2022 at 10:11:14AM +0100, Paul Cercueil wrote: > > > Hi Daniel, > > > > > > Le mar., mars 29 2022 at 10:33:32 +0200, Daniel Vetter > > > <daniel@xxxxxxxx> a > > > écrit : > > > > On Tue, Feb 15, 2022 at 05:43:35PM +0000, Paul Cercueil wrote: > > > > > Hi Jonathan, > > > > > > > > > > Le dim., févr. 13 2022 at 18:46:16 +0000, Jonathan Cameron > > > > > <jic23@xxxxxxxxxx> a écrit : > > > > > > On Mon, 7 Feb 2022 12:59:21 +0000 > > > > > > Paul Cercueil <paul@xxxxxxxxxxxxxxx> wrote: > > > > > > > > > > > > > Hi Jonathan, > > > > > > > > > > > > > > This is the V2 of my patchset that introduces a new > > > userspace > > > > > > > interface > > > > > > > based on DMABUF objects to complement the fileio API, and > > > adds > > > > > > > write() > > > > > > > support to the existing fileio API. > > > > > > > > > > > > Hi Paul, > > > > > > > > > > > > It's been a little while. Perhaps you could summarize the > > > various > > > > > view > > > > > > points around the appropriateness of using DMABUF for this? > > > > > > I appreciate it is a tricky topic to distil into a brief > > > summary > > > > > but > > > > > > I know I would find it useful even if no one else does! > > > > > > > > > > So we want to have a high-speed interface where buffers of > > > samples > > > > > are > > > > > passed around between IIO devices and other devices (e.g. USB > > > or > > > > > network), > > > > > or made available to userspace without copying the data. > > > > > > > > > > DMABUF is, at least in theory, exactly what we need. Quoting > > > the > > > > > documentation > > > > > > > > (https://www.kernel.org/doc/html/v5.15/driver-api/dma-buf.html): > > > > > "The dma-buf subsystem provides the framework for sharing > > > buffers > > > > > for > > > > > hardware (DMA) access across multiple device drivers and > > > > > subsystems, and for > > > > > synchronizing asynchronous hardware access. This is used, for > > > > > example, by > > > > > drm “prime” multi-GPU support, but is of course not limited to > > > GPU > > > > > use > > > > > cases." > > > > > > > > > > The problem is that right now DMABUF is only really used by > > > DRM, > > > > > and to > > > > > quote Daniel, "dma-buf looks like something super generic and > > > > > useful, until > > > > > you realize that there's a metric ton of gpu/accelerator bagage > > > > > piled in". > > > > > > > > > > Still, it seems to be the only viable option. We could add a > > > custom > > > > > buffer-passing interface, but that would mean implementing the > > > same > > > > > buffer-passing interface on the network and USB stacks, and > > > before > > > > > we know > > > > > it we re-invented DMABUFs. > > > > > > > > dma-buf also doesn't support sharing with network and usb stacks, > > > so I'm > > > > a > > > > bit confused why exactly this is useful? > > > > > > There is an attempt to get dma-buf support in the network stack, > > > called > > > "zctap". Last patchset was sent last november. USB stack does not > > > support > > > dma-buf, but we can add it later I guess. > > > > > > > So yeah unless there's some sharing going on with gpu stuff (for > > > data > > > > processing maybe) I'm not sure this makes a lot of sense really. > > > Or at > > > > least some zero-copy sharing between drivers, but even that would > > > > minimally require a dma-buf import ioctl of some sorts. Which I > > > either > > > > missed or doesn't exist. > > > > > > We do want zero-copy between drivers, the network stack, and the > > > USB stack. > > > It's not just about having a userspace interface. > > > > I think in that case we need these other pieces too. And we need acks > > from > > relevant subsystems that these other pieces are a) ready for upstream > > merging and also that the dma-buf side of things actually makes sense. > > Ok... > > > > > If there's none of that then just hand-roll your buffer handling > > > code > > > > (xarray is cheap to use in terms of code for this), you can > > > always add > > > > dma-buf import/export later on when the need arises. > > > > > > > > Scrolling through patches you only have dma-buf export, but no > > > > importing, > > > > so the use-case that works is with one of the existing subsystems > > > that > > > > supporting dma-buf importing. > > > > > > > > I think minimally we need the use-case (in form of code) that > > > needs the > > > > buffer sharing here. > > > > > > I'll try with zctap and report back. > > > > Do you have a link for this? I just checked dri-devel on lore, and it's > > not there. Nor anywhere else. > > The code is here: https://github.com/jlemon/zctap_kernel > > I know Jonathan Lemon (Cc'd) was working on upstreaming it, I saw a few > patchsets. Yeah if the goal here is to zero-copy from iio to network sockets, then I think we really need the full picture first, at least as a prototype. And also a rough consensus among all involved subsystems that this is the right approach and that there's no fundamental issues. I really have no clue about network to make a call there. I'm bringing this up because a few folks wanted to look into zero-copy between gpu and nvme, using dma-buf. And after lots of head-banging-against-solid-concrete-walls, at least my conclusion is that due to locking issues it's really not possible without huge changes to the block i/o. And those are not on the table. -Daniel > > Cheers, > -Paul > > > We really need all the pieces, and if block layer reaction is anything > > to > > judge by, dma-buf wont happen for networking either. There's some really > > nasty and fairly fundamental issues with locking and memory reclaim that > > make this utter pain or outright impossible. > > -Daniel > > > > > > > > Cheers, > > > -Paul > > > > > > > > > > > > > > > > > Changes since v1: > > > > > > > > > > > > > > - the patches that were merged in v1 have been (obviously) > > > > > dropped > > > > > > > from > > > > > > > this patchset; > > > > > > > - the patch that was setting the write-combine cache > > > setting > > > > > has > > > > > > > been > > > > > > > dropped as well, as it was simply not useful. > > > > > > > - [01/12]: > > > > > > > * Only remove the outgoing queue, and keep the > > > incoming > > > > > queue, > > > > > > > as we > > > > > > > want the buffer to start streaming data as soon as > > > it is > > > > > > > enabled. > > > > > > > * Remove IIO_BLOCK_STATE_DEQUEUED, since it is now > > > > > functionally > > > > > > > the > > > > > > > same as IIO_BLOCK_STATE_DONE. > > > > > > > - [02/12]: > > > > > > > * Fix block->state not being reset in > > > > > > > iio_dma_buffer_request_update() for output buffers. > > > > > > > * Only update block->bytes_used once and add a comment > > > > > about > > > > > > > why we > > > > > > > update it. > > > > > > > * Add a comment about why we're setting a different > > > state > > > > > for > > > > > > > output > > > > > > > buffers in iio_dma_buffer_request_update() > > > > > > > * Remove useless cast to bool (!!) in > > > iio_dma_buffer_io() > > > > > > > - [05/12]: > > > > > > > Only allow the new IOCTLs on the buffer FD created > > > with > > > > > > > IIO_BUFFER_GET_FD_IOCTL(). > > > > > > > - [12/12]: > > > > > > > * Explicitly state that the new interface is optional > > > and > > > > > is > > > > > > > not implemented by all drivers. > > > > > > > * The IOCTLs can now only be called on the buffer FD > > > > > returned by > > > > > > > IIO_BUFFER_GET_FD_IOCTL. > > > > > > > * Move the page up a bit in the index since it is core > > > > > stuff > > > > > > > and not > > > > > > > driver-specific. > > > > > > > > > > > > > > The patches not listed here have not been modified since > > > v1. > > > > > > > > > > > > > > Cheers, > > > > > > > -Paul > > > > > > > > > > > > > > Alexandru Ardelean (1): > > > > > > > iio: buffer-dma: split iio_dma_buffer_fileio_free() > > > function > > > > > > > > > > > > > > Paul Cercueil (11): > > > > > > > iio: buffer-dma: Get rid of outgoing queue > > > > > > > iio: buffer-dma: Enable buffer write support > > > > > > > iio: buffer-dmaengine: Support specifying buffer > > > direction > > > > > > > iio: buffer-dmaengine: Enable write support > > > > > > > iio: core: Add new DMABUF interface infrastructure > > > > > > > iio: buffer-dma: Use DMABUFs instead of custom solution > > > > > > > iio: buffer-dma: Implement new DMABUF based userspace > > > API > > > > > > > iio: buffer-dmaengine: Support new DMABUF based > > > userspace API > > > > > > > iio: core: Add support for cyclic buffers > > > > > > > iio: buffer-dmaengine: Add support for cyclic buffers > > > > > > > Documentation: iio: Document high-speed DMABUF based API > > > > > > > > > > > > > > Documentation/driver-api/dma-buf.rst | 2 + > > > > > > > Documentation/iio/dmabuf_api.rst | 94 +++ > > > > > > > Documentation/iio/index.rst | 2 + > > > > > > > drivers/iio/adc/adi-axi-adc.c | 3 +- > > > > > > > drivers/iio/buffer/industrialio-buffer-dma.c | 610 > > > > > > > ++++++++++++++---- > > > > > > > .../buffer/industrialio-buffer-dmaengine.c | 42 +- > > > > > > > drivers/iio/industrialio-buffer.c | 60 ++ > > > > > > > include/linux/iio/buffer-dma.h | 38 +- > > > > > > > include/linux/iio/buffer-dmaengine.h | 5 +- > > > > > > > include/linux/iio/buffer_impl.h | 8 + > > > > > > > include/uapi/linux/iio/buffer.h | 30 + > > > > > > > 11 files changed, 749 insertions(+), 145 deletions(-) > > > > > > > create mode 100644 Documentation/iio/dmabuf_api.rst > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Daniel Vetter > > > > Software Engineer, Intel Corporation > > > > http://blog.ffwll.ch > > > > > > > > > > -- > > Daniel Vetter > > Software Engineer, Intel Corporation > > http://blog.ffwll.ch > > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch