Re: [RFC] Experimental DMA-BUF Device Heaps

Laurent Pinchart <laurent.pinchart@xxxxxxxxxxxxxxxx> · Mon, 31 Aug 2020 18:08:41 +0300

Hi James,

On Sun, Aug 23, 2020 at 03:53:50PM -0700, James Jones wrote:
> On 8/23/20 1:46 PM, Laurent Pinchart wrote:
> > On Sun, Aug 23, 2020 at 01:04:43PM -0700, James Jones wrote:
> >> On 8/20/20 1:15 AM, Ezequiel Garcia wrote:
> >>> On Mon, 2020-08-17 at 20:49 -0700, James Jones wrote:
> >>>> On 8/17/20 8:18 AM, Brian Starkey wrote:
> >>>>> On Sun, Aug 16, 2020 at 02:22:46PM -0300, Ezequiel Garcia wrote:
> >>>>>> This heap is basically a wrapper around DMA-API dma_alloc_attrs,
> >>>>>> which will allocate memory suitable for the given device.
> >>>>>>
> >>>>>> The implementation is mostly a port of the Contiguous Videobuf2
> >>>>>> memory allocator (see videobuf2/videobuf2-dma-contig.c)
> >>>>>> over to the DMA-BUF Heap interface.
> >>>>>>
> >>>>>> The intention of this allocator is to provide applications
> >>>>>> with a more system-agnostic API: the only thing the application
> >>>>>> needs to know is which device to get the buffer for.
> >>>>>>
> >>>>>> Whether the buffer is backed by CMA, IOMMU or a DMA Pool
> >>>>>> is unknown to the application.
> >>>>>>
> >>>>>> I'm not really expecting this patch to be correct or even
> >>>>>> a good idea, but just submitting it to start a discussion on DMA-BUF
> >>>>>> heap discovery and negotiation.
> >>>>>>
> >>>>>
> >>>>> My initial reaction is that I thought dmabuf heaps are meant for use
> >>>>> to allocate buffers for sharing across devices, which doesn't fit very
> >>>>> well with having per-device heaps.
> >>>>>
> >>>>> For single-device allocations, would using the buffer allocation
> >>>>> functionality of that device's native API be better in most
> >>>>> cases? (Some other possibly relevant discussion at [1])
> >>>>>
> >>>>> I can see that this can save some boilerplate for devices that want
> >>>>> to expose private chunks of memory, but might it also lead to 100
> >>>>> aliases for the system's generic coherent memory pool?
> >>>>>
> >>>>> I wonder if a set of helpers to allow devices to expose whatever they
> >>>>> want with minimal effort would be better.
> >>>>
> >>>> I'm rather interested on where this goes, as I was toying with using
> >>>> some sort of heap ID as a basis for a "device-local" constraint in the
> >>>> memory constraints proposals Simon and I will be discussing at XDC this
> >>>> year.  It would be rather elegant if there was one type of heap ID used
> >>>> universally throughout the kernel that could provide a unique handle for
> >>>> the shared system memory heap(s), as well as accelerator-local heaps on
> >>>> fancy NICs, GPUs, NN accelerators, capture devices, etc. so apps could
> >>>> negotiate a location among themselves.  This patch seems to be a step
> >>>> towards that in a way, but I agree it would be counterproductive if a
> >>>> bunch of devices that were using the same underlying system memory ended
> >>>> up each getting their own heap ID just because they used some SW
> >>>> framework that worked that way.
> >>>>
> >>>> Would appreciate it if you could send along a pointer to your BoF if it
> >>>> happens!
> >>>
> >>> Here is it:
> >>>
> >>> https://linuxplumbersconf.org/event/7/contributions/818/
> >>>
> >>> It would be great to see you there and discuss this,
> >>> given I was hoping we could talk about how to meet a
> >>> userspace allocator library expectations as well.
> >>
> >> Thanks!  I hadn't registered for LPC and it looks like it's sold out,
> >> but I'll try to watch the live stream.
> >>
> >> This is very interesting, in that it looks like we're both trying to
> >> solve roughly the same set of problems but approaching it from different
> >> angles.  From what I gather, your approach is that a "heap" encompasses
> >> all the allocation constraints a device may have.
> >>
> >> The approach Simon Ser and I are tossing around so far is somewhat
> >> different, but may potentially leverage dma-buf heaps a bit as well.
> >>
> >> Our approach looks more like what I described at XDC a few years ago,
> >> where memory constraints for a given device's usage of an image are
> >> exposed up to applications, which can then somehow perform boolean
> >> intersection/union operations on them to arrive at a common set of
> >> constraints that describe something compatible with all the devices &
> >> usages desired (or fail to do so, and fall back to copying things around
> >> presumably).  I believe this is more flexible than your initial proposal
> >> in that devices often support multiple usages (E.g., different formats,
> >> different proprietary layouts represented by format modifiers, etc.),
> >> and it avoids adding a combinatorial number of heaps to manage that.
> >>
> >> In my view, heaps are more like blobs of memory that can be allocated
> >> from in various different ways to satisfy constraints.  I realize heaps
> >> mean something specific in the dma-buf heap design (specifically,
> >> something closer to an association between an "allocation mechanism" and
> >> "physical memory"), but I hope we don't have massive heap/allocator
> >> mechanism proliferation due to constraints alone.  Perhaps some
> >> constraints, such as contiguous memory or device-local memory, are
> >> properly expressed as a specific heap, but consider the proliferation
> >> implied by even that simple pair of examples: How do you express
> >> contiguous device-local memory?  Do you need to spawn two heaps on the
> >> underlying device-local memory, one for contiguous allocations and one
> >> for non-contiguous allocations?  Seems excessive.
> >>
> >> Of course, our approach also has downsides and is still being worked on.
> >>    For example, it works best in an ideal world where all the allocators
> >> available understand all the constraints that exist.
> > 
> > Shouldn't allocators be decoupled of constraints ? In my imagination I
> > see devices exposing constraints, and allocators exposing parameters,
> > with a userspace library to reconcile the constraints and produce
> > allocator parameters from them.
> 
> Perhaps another level of abstraction would help.  I'll have to think 
> about that.
> 
> However, as far as I can tell, it wouldn't remove the need to 
> communicate a lot of constraints from multiple engines/devices/etc. to 
> the allocator (likely a single allocator.  I'd be interested to know if 
> anyone has a design that effectively uses multiple allocators to satisfy 
> a single allocation request, but I haven't come up with a good one) 
> somehow.  Either the constraints are directly used as the parameters, or 
> there's a translation/second level of abstraction, but either way much 
> of the information needs to make it to the allocator, or represent the 
> need to use a particular allocator.  Simple things like pitch and offset 
> alignment can be done without help from a kernel-level allocator, but 
> others such as cache coherency, physical memory bank placement, or 
> device-local memory will need to make it all the way down to the kernel 
> some how I believe.

I fully agree that we'll need kernel support, but I don't think the
constraints reporting API and the allocator API need to speak the same
language. For instance, drivers will report alignment constraints, the
userspace allocator library will translate that to a pitch value, and
pass it to the allocator as an allocation parameter. The allocator won't
know about alignment constraints. That's a simple example, let's see how
it turns out with more complex constraints. With a centralized userspace
library we have the ability to decouple the two sides, which I believe
can be useful to keep the complexity of constraints and allocation
parameters (as) low (as possible).

> >> Dealing with a
> >> reality where there are probably a handful of allocators, another
> >> handful of userspace libraries and APIs, and still more applications
> >> trying to make use of all this is one of the larger remaining challenges
> >> of the design.
> >>
> >> We'll present our work at XDC 2020.  Hope you can check that out as well!
> >>
> >>>>> 1. https://lore.kernel.org/dri-devel/57062477-30e7-a3de-6723-a50d03a402c4@xxxxxxxx/
> >>>>>
> >>>>>> Given Plumbers is just a couple weeks from now, I've submitted
> >>>>>> a BoF proposal to discuss this, as perhaps it would make
> >>>>>> sense to discuss this live?
> >>>>>>
> >>>>>> Not-signed-off-by: Ezequiel Garcia <ezequiel@xxxxxxxxxxxxx>
> > 

-- 
Regards,

Laurent Pinchart