Re: Try to address the DMA-buf coherency problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 03.11.22 um 23:16 schrieb Nicolas Dufresne:
[SNIP]
We already had numerous projects where we reported this practice as bugs
to the GStreamer and FFMPEG project because it won't work on x86 with dGPUs.
Links ? Remember that I do read every single bugs and emails around GStreamer
project. I do maintain older and newer V4L2 support in there. I also did
contribute a lot to the mechanism GStreamer have in-place to reverse the
allocation. In fact, its implemented, the problem being that on generic Linux,
the receiver element, like the GL element and the display sink don't have any
API they can rely on to allocate memory. Thus, they don't implement what we call
the allocation offer in GStreamer term. Very often though, on other modern OS,
or APIs like VA, the memory offer is replaced by a context. So the allocation is
done from a "context" which is neither an importer or an exporter. This is
mostly found on MacOS and Windows.

Was there APIs suggested to actually make it manageable by userland to allocate
from the GPU? Yes, this what Linux Device Allocator idea is for. Is that API
ready, no.

Well, that stuff is absolutely ready: https://elixir.bootlin.com/linux/latest/source/drivers/dma-buf/heaps/system_heap.c#L175 What do you think I'm talking about all the time?

DMA-buf has a lengthy section about CPU access to buffers and clearly documents how all of that is supposed to work: https://elixir.bootlin.com/linux/latest/source/drivers/dma-buf/dma-buf.c#L1160 This includes braketing of CPU access with dma_buf_begin_cpu_access() and dma_buf_end_cpu_access(), as well as transaction management between devices and the CPU and even implicit synchronization.

This specification is then implemented by the different drivers including V4L2: https://elixir.bootlin.com/linux/latest/source/drivers/media/common/videobuf2/videobuf2-dma-sg.c#L473

As well as the different DRM drivers: https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c#L117 https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c#L234

This design was then used by us with various media players on different customer projects, including QNAP https://www.qnap.com/en/product/ts-877 as well as the newest Tesla https://www.amd.com/en/products/embedded-automotive-solutions

I won't go into the details here, but we are using exactly the approach I've outlined to let userspace control the DMA between the different device in question. I'm one of the main designers of that and our multimedia and mesa team has up-streamed quite a number of changes for this project.

I'm not that well into different ARM based solutions because we are just recently getting results that this starts to work with AMD GPUs, but I'm pretty sure that the design should be able to handle that as well.

So we have clearly prove that this design works, even with special requirements which are way more complex than what we are discussing here. We had cases where we used GStreamer to feed DMA-buf handles into multiple devices with different format requirements and that seems to work fine.

-----

But enough of this rant. As I wrote Lucas as well this doesn't help us any further in the technical discussion.

The only technical argument I have is that if some userspace applications fail to use the provided UAPI while others use it correctly then this is clearly not a good reason to change the UAPI, but rather an argument to change the applications.

If the application should be kept simple and device independent then allocating the buffer from the device independent DMA heaps would be enough as well. Cause that provider implements the necessary handling for dma_buf_begin_cpu_access() and dma_buf_end_cpu_access().

I'm a bit surprised that we are arguing about stuff like this because we spend a lot of effort trying to document this. Daniel gave me the job to fix  this documentation, but after reading through it multiple times now I can't seem to find where the design and the desired behavior is unclear.

What is clearly a bug in the kernel is that we don't reject things which won't work correctly and this is what this patch here addresses. What we could talk about is backward compatibility for this patch, cause it might look like it breaks things which previously used to work at least partially.

Regards,
Christian.



[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux