Re: [PATCH 0/8] dma-buf: heaps: Support carved-out heaps and ECC related-flags

Thierry Reding <thierry.reding@xxxxxxxxx> · Thu, 11 Jul 2024 14:43:24 +0200

On Wed, Jul 10, 2024 at 02:10:09PM GMT, Maxime Ripard wrote:
> On Fri, Jul 05, 2024 at 04:31:34PM GMT, Thierry Reding wrote:
> > On Thu, Jul 04, 2024 at 02:24:49PM GMT, Maxime Ripard wrote:
> > > On Fri, Jun 28, 2024 at 04:42:35PM GMT, Thierry Reding wrote:
> > > > On Fri, Jun 28, 2024 at 03:08:46PM GMT, Maxime Ripard wrote:
> > > > > Hi,
> > > > > 
> > > > > On Fri, Jun 28, 2024 at 01:29:17PM GMT, Thierry Reding wrote:
> > > > > > On Tue, May 21, 2024 at 02:06:19PM GMT, Daniel Vetter wrote:
> > > > > > > On Thu, May 16, 2024 at 09:51:35AM -0700, John Stultz wrote:
> > > > > > > > On Thu, May 16, 2024 at 3:56 AM Daniel Vetter <daniel@xxxxxxxx> wrote:
> > > > > > > > > On Wed, May 15, 2024 at 11:42:58AM -0700, John Stultz wrote:
> > > > > > > > > > But it makes me a little nervous to add a new generic allocation flag
> > > > > > > > > > for a feature most hardware doesn't support (yet, at least). So it's
> > > > > > > > > > hard to weigh how common the actual usage will be across all the
> > > > > > > > > > heaps.
> > > > > > > > > >
> > > > > > > > > > I apologize as my worry is mostly born out of seeing vendors really
> > > > > > > > > > push opaque feature flags in their old ion heaps, so in providing a
> > > > > > > > > > flags argument, it was mostly intended as an escape hatch for
> > > > > > > > > > obviously common attributes. So having the first be something that
> > > > > > > > > > seems reasonable, but isn't actually that common makes me fret some.
> > > > > > > > > >
> > > > > > > > > > So again, not an objection, just something for folks to stew on to
> > > > > > > > > > make sure this is really the right approach.
> > > > > > > > >
> > > > > > > > > Another good reason to go with full heap names instead of opaque flags on
> > > > > > > > > existing heaps is that with the former we can use symlinks in sysfs to
> > > > > > > > > specify heaps, with the latter we need a new idea. We haven't yet gotten
> > > > > > > > > around to implement this anywhere, but it's been in the dma-buf/heap todo
> > > > > > > > > since forever, and I like it as a design approach. So would be a good idea
> > > > > > > > > to not toss it. With that display would have symlinks to cma-ecc and cma,
> > > > > > > > > and rendering maybe cma-ecc, shmem, cma heaps (in priority order) for a
> > > > > > > > > SoC where the display needs contig memory for scanout.
> > > > > > > > 
> > > > > > > > So indeed that is a good point to keep in mind, but I also think it
> > > > > > > > might re-inforce the choice of having ECC as a flag here.
> > > > > > > > 
> > > > > > > > Since my understanding of the sysfs symlinks to heaps idea is about
> > > > > > > > being able to figure out a common heap from a collection of devices,
> > > > > > > > it's really about the ability for the driver to access the type of
> > > > > > > > memory. If ECC is just an attribute of the type of memory (as in this
> > > > > > > > patch series), it being on or off won't necessarily affect
> > > > > > > > compatibility of the buffer with the device.  Similarly "uncached"
> > > > > > > > seems more of an attribute of memory type and not a type itself.
> > > > > > > > Hardware that can access non-contiguous "system" buffers can access
> > > > > > > > uncached system buffers.
> > > > > > > 
> > > > > > > Yeah, but in graphics there's a wide band where "shit performance" is
> > > > > > > defacto "not useable (as intended at least)".
> > > > > > > 
> > > > > > > So if we limit the symlink idea to just making sure zero-copy access is
> > > > > > > possible, then we might not actually solve the real world problem we need
> > > > > > > to solve. And so the symlinks become somewhat useless, and we need to
> > > > > > > somewhere encode which flags you need to use with each symlink.
> > > > > > > 
> > > > > > > But I also see the argument that there's a bit a combinatorial explosion
> > > > > > > possible. So I guess the question is where we want to handle it ...
> > > > > > 
> > > > > > Sorry for jumping into this discussion so late. But are we really
> > > > > > concerned about this combinatorial explosion in practice? It may be
> > > > > > theoretically possible to create any combination of these, but do we
> > > > > > expect more than a couple of heaps to exist in any given system?
> > > > > 
> > > > > I don't worry too much about the number of heaps available in a given
> > > > > system, it would indeed be fairly low.
> > > > > 
> > > > > My concern is about the semantics combinatorial explosion. So far, the
> > > > > name has carried what semantics we were supposed to get from the buffer
> > > > > we allocate from that heap.
> > > > > 
> > > > > The more variations and concepts we'll have, the more heap names we'll
> > > > > need, and with confusing names since we wouldn't be able to change the
> > > > > names of the heaps we already have.
> > > > 
> > > > What I was trying to say is that none of this matters if we make these
> > > > names opaque. If these names are contextual for the given system it
> > > > doesn't matter what the exact capabilities are. It only matters that
> > > > their purpose is known and that's what applications will be interested
> > > > in.
> > > 
> > > If the names are opaque, and we don't publish what the exact
> > > capabilities are, how can an application figure out which heap to use in
> > > the first place?
> > 
> > This would need to be based on conventions. The idea is to standardize
> > on a set of names for specific, well-known use-cases.
> 
> How can undocumented, unenforced, conventions can work in practice?
> 
> > > > > > Would it perhaps make more sense to let a platform override the heap
> > > > > > name to make it more easily identifiable? Maybe this is a naive
> > > > > > assumption, but aren't userspace applications and drivers not primarily
> > > > > > interested in the "type" of heap rather than whatever specific flags
> > > > > > have been set for it?
> > > > > 
> > > > > I guess it depends on what you call the type of a heap. Where we
> > > > > allocate the memory from, sure, an application won't care about that.
> > > > > How the buffer behaves on the other end is definitely something
> > > > > applications are going to be interested in though.
> > > > 
> > > > Most of these heaps will be very specific, I would assume.
> > > 
> > > We don't have any specific heap upstream at the moment, only generic
> > > ones.
> > 
> > But we're trying to add more specific ones, right?
> > 
> > > > For example a heap that is meant to be protected for protected video
> > > > decoding is both going to be created in such a way as to allow that
> > > > use-case (i.e. it doesn't make sense for it to be uncached, for
> > > > example) and it's also not going to be useful for any other use-case
> > > > (i.e. there's no reason to use that heap for GPU jobs or networking,
> > > > or whatever).
> > > 
> > > Right. But also, libcamera has started to use dma-heaps to allocate
> > > dma-capable buffers and do software processing on it before sending it
> > > to some hardware controller.
> > > 
> > > Caches are critical here, and getting a non-cacheable buffer would be
> > > a clear regression.
> > 
> > I understand that. My point is that maybe we shouldn't try to design a
> > complex mechanism that allows full discoverability of everything that a
> > heap supports or is capable of. Instead if the camera has specific
> > requirements, it could look for a heap named "camera". Or if it can
> > share a heap with other multimedia devices, maybe call the heap
> > "multimedia".
> 
> That kind of vague categorization is pointless though. Some criteria are
> about hardwar (ie, can the device access it in the first place?), so is
> purely about a particular context and policy and will change from one
> application to the other.
> 
> A camera app using an ISP will not care about caches. A software
> rendering library will. A compositor will not want ECC. A safety
> component probably will.
> 
> All of them are "multimedia".
> 
> We *need* to be able to differentiate policy from hardware requirements.
> 
> > The idea is that heaps for these use-cases are quite specific, so you
> > would likely not find an arbitrary number of processes try to use the
> > same heap.
> 
> Some of them are specific, some of them aren't.
> 
> > > How can it know which heap to allocate from on a given platform?
> > > 
> > > Similarly with the ECC support we started that discussion with. ECC will
> > > introduce a significant performance cost. How can a generic application,
> > > such as a compositor, will know which heap to allocate from without:
> > > 
> > > a) Trying to bundle up a list of heaps for each platform it might or
> > >    might not run
> > > 
> > > b) and handling the name difference between BSPs and mainline.
> > 
> > Obviously some standardization of heap names is a requirement here,
> > otherwise such a proposal does indeed not make sense.
> > 
> > > If some hardware-specific applications / middleware want to take a
> > > shortcut and use the name, that's fine. But we need to find a way for
> > > generic applications to discover which heap is best suited for their
> > > needs without the name.
> > 
> > You can still have fairly generic names for heaps. If you want protected
> > content, you could try to use a standard "video-protected" heap. If you
> > need ECC protected memory, maybe you want to allocate from a heap named
> > "safety", or whatever.
> 
> And if I need cacheable, physically contiguous, "multimedia" buffers from
> ECC protected memory?
> 
> > > > > And if we allow any platform to change a given heap name, then a generic
> > > > > application won't be able to support that without some kind of
> > > > > platform-specific configuration.
> > > > 
> > > > We could still standardize on common use-cases so that applications
> > > > would know what heaps to allocate from. But there's also no need to
> > > > arbitrarily restrict this. For example there could be cases that are
> > > > very specific to a particular platform and which just doesn't exist
> > > > anywhere else. Platform designers could then still use this mechanism to
> > > > define that very particular heap and have a very specialized userspace
> > > > application use that heap for their purpose.
> > > 
> > > We could just add a different capabitily flag to make sure those would
> > > get ignored.
> > 
> > Sure you can do all of this with a myriad of flags. But again, I'm
> > trying to argue that we may not need this additional complexity. In a
> > typical system, how many heaps do you encounter? You may need a generic
> > one and then perhaps a handful specific ones? Or do you need more?
> 
> It's not a matter of the number of heaps, but what they provide.
> 
> > > > > > For example, if an applications wants to use a protected buffer, the
> > > > > > application doesn't (and shouldn't need to) care about whether the heap
> > > > > > for that buffer supports ECC or is backed by CMA. All it really needs to
> > > > > > know is that it's the system's "protected" heap.
> > > > > 
> > > > > I mean... "protected" very much means backed by CMA already, it's pretty
> > > > > much the only thing we document, and we call it as such in Kconfig.
> > > > 
> > > > Well, CMA is really just an implementation detail, right? It doesn't
> > > > make sense to advertise that to anything outside the kernel. Maybe it's
> > > > an interesting fact that buffers allocated from these heaps will be
> > > > physically contiguous?
> > > 
> > > CMA itself might be an implementation detail, but it's still right there
> > > in the name on ARM.
> > 
> > That doesn't mean we can do something more useful going forward (and
> > perhaps symlink for backwards-compatibility if needed).
> > 
> > > And being able to get physically contiguous buffers is critical on
> > > platforms without an IOMMU.
> > 
> > Again, I'm not trying to dispute the necessity of contiguous buffers.
> > I'm trying to say that contextual names can be a viable alternative to
> > full discoverability. If you want contiguous buffers, go call the heap
> > "contiguous" and it's quite clear what it means.
> > 
> > You can even hide details such as IOMMU availability from userspace that
> > way. On a system where an IOMMU is present, you could for example go and
> > use IOMMU-backed memory in a "contiguous" heap, while on a system
> > without an IOMMU the memory for the "contiguous" heap could come from
> > CMA.
> 
> I can see the benefits from that, and it would be quite nice indeed.
> However, it still only addresses the "hardware" part of the requirements
> (ie, is it contiguous, accessible, etc.). It doesn't address
> applications having different requirements when it comes to what kind of
> attributes they'd like/need to get from the buffer.
> 
> If one application in the system wants contiguous (using your definition
> just above) buffers without caches, and the other wants to have
> contiguous cacheable buffers, if we're only using the name we'd need to
> instantiate two heaps, from the same allocator, for what's essentially a
> mapping attribute.
> 
> It's more complex for the kernel, more code to maintain, and more
> complex for applications too because they need to know about what a
> given name means for that particular context.
> 
> > > > In the majority of cases that's probably not even something that
> > > > matters because we get a DMA-BUF anyway and we can map that any way we
> > > > want.
> > > >
> > > > Irrespective of that, physically contigous buffers could be allocated in
> > > > any number of ways, CMA is just a convenient implementation of one such
> > > > allocator.
> > > > 
> > > > > But yeah, I agree that being backed by CMA is probably not what an
> > > > > application cares about (and we even have might some discussions about
> > > > > that), but if the ECC protection comes at a performance cost then it
> > > > > will very much care about it. Or if it comes with caches enabled or not.
> > > > 
> > > > True, no doubt about that. However, I'm saying there may be advantages
> > > > in hiding all of this from applications. Let's say we're trying to
> > > > implement video decoding. We can create a special "protected-video" heap
> > > > that is specifically designed to allocate encrypted/protected scanout
> > > > buffers from.
> > > > 
> > > > When you design that system, you would most certainly not enable ECC
> > > > protection on that heap because it leads to bad performance. You would
> > > > also want to make sure that all of the buffers in that heap are cached
> > > > and whatever other optimizations your chip may provide.
> > > > 
> > > > Your application doesn't have to care about this, though, because it can
> > > > simply look for a heap named "protected-video" and allocate buffers from
> > > > it.
> > > 
> > > I mean, I disagree. Or rather, in an environment where you have a system
> > > architect, and the application is targeted for a particular system only,
> > > and where "protected-video" means whatever the team decided in general,
> > > yeah, that works.
> > > 
> > > So, in a BSP or Android, that works fine.
> > > 
> > > On a mainline based system, with generic stacks like libcamera, it just
> > > doesn't fly anymore.
> > 
> > I'm not sure I know of a system that isn't architected. Even very
> > "generic" devices have a set of functionality that the manufacturer
> > wanted the device to provide.
> >
> > Aren't generic stacks not also build to provide a specific function?
> > Again, libcamera could try to use a "camera" heap, or maybe it would fit
> > into that "multimedia" category.
> > 
> > For truly generic systems you typically don't need any of this, right? A
> > generic system like a PC usually gets by with just system memory and
> > maybe video RAM for some specific cases.
> 
> Why wouldn't we need this for a truly generic system?

Because ARM systems really aren't that generic. That's why we need these
special carveouts and such in the first place.

Once you start making an ARM system more generic (say, by adding things
like PCI devices and such into the mix), then these specific cases tend
to go away.

Another way of saying this is that these carveouts are usually needed
for some SoC-specific functionality, so they are inherently bound to
that SoC and no longer generic.

> With ARM laptops around the corner, pretty much the same SoC can be used
> in a tablet, in a car, or in a "generic system like a PC".

A "generic system like a PC" based on ARM would still be tied to the
specific ARM SoC that's being used, so it's not generic in the same way
that a PC is.

Fundamentally the same SoC is going to need the same carveouts, whether
it's used in a tablet, in a car or in a laptop. The carveout's use is
tied to a particular function of the system. Anything that is not tied
to a particular function is just plain old system memory, isn't it?

Of course I may be completely ignorant of whatever it is that you have
in mind, so maybe you can provide some concrete examples of where this
isn't the case?

Thierry
Attachment:
signature.asc

Description: PGP signature