Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

nicolas.dufresne@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx · Tue, 21 May 2024 10:18:53 -0400

Le mardi 21 mai 2024 à 10:43 +0200, Maxime Ripard a écrit :
> On Thu, May 16, 2024 at 01:11:51PM GMT, nicolas.dufresne@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx wrote:
> > Le jeudi 16 mai 2024 à 14:27 +0300, Laurent Pinchart a écrit :
> > > Hi Nicolas,
> > > 
> > > On Wed, May 15, 2024 at 01:43:58PM -0400, nicolas.dufresne@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx wrote:
> > > > Le mardi 14 mai 2024 à 23:42 +0300, Laurent Pinchart a écrit :
> > > > > > You'll hit the same limitation as we hit in GStreamer, which is that KMS driver
> > > > > > only offer allocation for render buffers and most of them are missing allocators
> > > > > > for YUV buffers, even though they can import in these formats. (kms allocators,
> > > > > > except dumb, which has other issues, are format aware).
> > > > > 
> > > > > My experience on Arm platforms is that the KMS drivers offer allocation
> > > > > for scanout buffers, not render buffers, and mostly using the dumb
> > > > > allocator API. If the KMS device can scan out YUV natively, YUV buffer
> > > > > allocation should be supported. Am I missing something here ?
> > > > 
> > > > There is two APIs, Dumb is the legacy allocation API, only used by display
> > > 
> > > Is it legacy only ? I understand the dumb buffers API to be officially
> > > supported, to allocate scanout buffers suitable for software rendering.
> > > 
> > > > drivers indeed, and the API does not include a pixel format or a modifier. The
> > > > allocation of YUV buffer has been made through a small hack, 
> > > > 
> > > >   bpp = number of bits per component (of luma plane if multiple planes)
> > > >   width = width
> > > >   height = height * X
> > > > 
> > > > Where X will vary, "3 / 2" is used for 420 subsampling, "2" for 422 and "3" for
> > > > 444. It is far from idea, requires deep knowledge of each formats in the
> > > > application
> > > 
> > > I'm not sure I see that as an issue, but our experiences and uses cases
> > > may vary :-)
> > 
> > Its extra burden, and does not scale to all available pixel formats. My reply
> > was for readers education as I feel like a lot of linux-media dev don't have a
> > clue of what is going on at the rendering side. This ensure a minimum knowledge
> > to everyone commenting.
> > 
> > And yes, within the GFX community, Dumb allocation is to be killed and
> > replacement completely in the future, it simply does not have a complete
> > replacement yet.
> > 
> > > 
> > > > and cannot allocate each planes seperatly.
> > > 
> > > For semi-planar or planar formats, unless I'm mistaken, you can either
> > > allocate a single buffer and use it with appropriate offsets when
> > > constructing your framebuffer (with DRM_IOCTL_MODE_ADDFB2), or allocate
> > > one buffer per plane.
> > 
> > We have use cases were single allocation is undesirable, but I don't really feel
> > like this is important enough for me to type this explanation. Ping me if you
> > care.
> > > 
> > > > The second is to use the driver specific allocation API. This is then abstracted
> > > > by GBM. This allows allocating render buffers with notably modifiers and/or use
> > > > cases. But no support for YUV formats or multi-planar formats.
> > > 
> > > GBM is the way to go for render buffers indeed. It has been designed
> > > with only graphics buffer management use cases in mind, so it's
> > > unfortunately not an option as a generic allocator, at least in its
> > > current form.
> > > 
> > 
> > What I perhaps should have highlighted that is that all these allocators in the
> > GFX (called DRM, but meh) subsystem abstract away some deep knowledge of the HW
> > requirements. Heaps are lower level APIs that assume that userspace have this
> > knowledge. The Android and ChromeOS solution is to take the implementation from
> > the kernel and move it into userspace, see minigbm from chromeos, or gralloc
> > from Android. As these two projects are device centric, they are not usable on
> > generic Linux. Heaps might have some future, but not without other piece of the
> > puzzle.
> > 
> > To come back to you wanting heaps in libcamera, because it makes them better for
> > rendered or display. Well today this is a lie you make to yourself, because this
> > is just a tiny bit of the puzzle, it is pure luck if you allocate dmabuf is
> > usable but a foreign device. At the end of the day, this is just a fallback to
> > satisfy that application are not forced to allocate that memory in libcamera.
> 
> I mean, it's pure luck, but can you point to any platform supported
> upstream where it wouldn't work?

Most AMD GPUs needs 256 bytes aligned strides. So unless you have that hardcoded
in libcamera its one case that often fail to import. There is no kernel API to
know anyway, so hardcoding is becoming common with the popularity of the GPUs.
Mali have a 64 bytes alignment required, except  for some YUV formats on very
recent Mesa. If you hardcode for AMD, it works for Mali too.

Intel display driver is an interesting one. Most of software dmabuf exporter
will enable cpu cache (UVC driver included). That driver fails to reject these
dmabuf assuming the exporter will always flush the cache. UVC driver as exported
does not, its not so clear to me if dmaheap+softISP (assuming softISP do the
dmabuf sync calls) will work or not. UVC to display artifacts were still visible
on 6.8 (last time I tested).

> 
> > Thus, I strongly recommend the udmabuf in the short term. Finally, moving to
> > heaps when the reported issue is resolved, as then it gives more options and
> > reduce the number of layers.
> 
> udmabuf wouldn't work with any platform without an IOMMU. We have plenty
> of those.

Its up to userspace to decide to allocate scattered or not, but again,
generically there is no API to let the application (softISP) know. Many of our
real-life tests concluded that using malloc data in software video processing to
finally do a memcpy() into final "device" memory is faster then using "coherent"
allocation or doing the cache handling.

> 
> All things considered, while I agree that it isn't the ideal solution,
> we really don't have a better (ie, works on a larger set of platforms)
> solution at the moment or in the next 5 years.

Indeed. In the short term, I like the idea that we'll first make it safe to
expose the heaps at all time, so at least we have a choice. Today, on most major
distributions none of the solution mentioned are available. I have no idea how
much work this is.

Nicolas