Re: [RFC PATCH net-next v8 02/14] net: page_pool: create hooks for custom page providers

Daniel Vetter <daniel@xxxxxxxx> · Tue, 7 May 2024 19:19:46 +0200

On Tue, May 07, 2024 at 01:48:38PM -0300, Jason Gunthorpe wrote:
> On Tue, May 07, 2024 at 09:42:05AM -0700, Mina Almasry wrote:
> 
> > 1. Align with devmem TCP to use udmabuf for your io_uring memory. I
> > think in the past you said it's a uapi you don't link but in the face
> > of this pushback you may want to reconsider.
> 
> dmabuf does not force a uapi, you can acquire your pages however you
> want and wrap them up in a dmabuf. No uapi at all.
> 
> The point is that dmabuf already provides ops that do basically what
> is needed here. We don't need ops calling ops just because dmabuf's
> ops are not understsood or not perfect. Fixup dmabuf.
> 
> If io_uring wants to take its existing memory pre-registration it can
> wrap that in a dmbauf, and somehow pass it to the netstack. Userspace
> doesn't need to know a dmabuf is being used in the background.

So roughly the current dma-buf design considerations for the users of the
dma-api interfaces:

- It's a memory buffer of fixed length.

- Either that memory is permanently nailed into place with dma_buf_pin
  (and if we add more users than just drm display then we should probably
  figure out the mlock account question for these). For locking hierarchy
  dma_buf_pin uses dma_resv_lock which nests within mmap_sem/vma locks but
  outside of any reclaim/alloc contexts.

- Or the memory is more dynamic, in which case case you need to be able to
  dma_resv_lock when you need the memory and make a promise (as a
  dma_fence) that you'll release the memory within finite time and without
  any further allocations once you've unlocked the dma_buf (because
  dma_fence is in GFP_NORECLAIM). That promise can be waiting for memory
  access to finish, but it can also be a pte invalidate+tlb flush, or some
  kind of preemption, or whatever your hw can do really.

  Also, if you do this dynamic model and need to atomically reserve more
  than one dma_buf, you get to do the wait/wound mutex dance, but that's
  really just a bunch of funny looking error handling code and not really
  impacting the overall design or locking hierarchy.

Everything else we can adjust, but I think the above three are not really
changeable or dma-buf becomes unuseable for gpu drivers.

Note that exporters of dma-buf can pretty much do whatever they feel like,
including rejecting all the generic interfaces/ops, because we also use
dma-buf as userspace handles for some really special memory.
-Sima
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch