On Thu, Jun 08, 2023 at 01:18:38AM +0800, Sui Jingfeng wrote: > Hi, > > On 2023/6/8 00:12, Paul Cercueil wrote: > > Hi Sui, > > > > Le mercredi 07 juin 2023 à 22:38 +0800, Sui Jingfeng a écrit : > > > Hi, welcome to discussion. > > > > > > > > > I have limited skills in manipulating English. > > > > > > It may not express what I'm really means in the short time. > > > > > > Part of word in the sentence may not as accurate as your. > > > > > > Well, please don't misunderstand, I'm not doing the rude to you. > > No problem. > > > > > I will explain it with more details. > > > > > > See below: > > > > > > > > > On 2023/6/7 20:09, Paul Cercueil wrote: > > > > Hi Sui, > > > > > > > > Le mercredi 07 juin 2023 à 18:30 +0800, Sui Jingfeng a écrit : > > > > > Hi, > > > > > > > > > > > > > > > On 2023/6/7 17:36, Paul Cercueil wrote: > > > > > > Hi Sui, > > > > > > > > > > > > Le mercredi 07 juin 2023 à 13:30 +0800, Sui Jingfeng a écrit : > > > > > > > The single map_noncoherent member of struct > > > > > > > drm_gem_dma_object > > > > > > > may > > > > > > > not > > > > > > > sufficient for describing the backing memory of the GEM > > > > > > > buffer > > > > > > > object. > > > > > > > > > > > > > > Especially on dma-coherent systems, the backing memory is > > > > > > > both > > > > > > > cached > > > > > > > coherent for multi-core CPUs and dma-coherent for peripheral > > > > > > > device. > > > > > > > Say architectures like X86-64, LoongArch64, Loongson Mips64, > > > > > > > etc. > > > > > > > > > > > > > > Whether a peripheral device is dma-coherent or not can be > > > > > > > implementation-dependent. The single map_noncoherent option > > > > > > > is > > > > > > > not > > > > > > > enough > > > > > > > to reflect real hardware anymore. For example, the Loongson > > > > > > > LS3A4000 > > > > > > > CPU > > > > > > > and LS2K2000/LS2K1000 SoC, peripheral device of such hardware > > > > > > > platform > > > > > > > allways snoop CPU's cache. Doing the allocation with > > > > > > > dma_alloc_coherent > > > > > > > function is preferred. The return buffer is cached, it should > > > > > > > not > > > > > > > using > > > > > > > the default write-combine mapping. While with the current > > > > > > > implement, > > > > > > > there > > > > > > > no way to tell the drm core to reflect this. > > > > > > > > > > > > > > This patch adds cached and coherent members to struct > > > > > > > drm_gem_dma_object. > > > > > > > which allow driver implements to inform the core. Introducing > > > > > > > new > > > > > > > mappings > > > > > > > while keeping the original default behavior unchanged. > > > > > > Did you try to simply set the "dma-coherent" property to the > > > > > > device's > > > > > > node? > > > > > But this approach can only be applied for the device driver with > > > > > DT > > > > > support. > > > > > > > > > > X86-64, Loongson ls3a4000 mips64, Loongson ls3a5000 CPU typically > > > > > do > > > > > not > > > > > have DT support. > > > > > > > > > > They using ACPI to pass parameter from the firmware to Linux > > > > > kernel. > > > > > > > > > > You approach will lost the effectiveness on such a case. > > > > Well, I don't really know how ACPI handles it - but it should just > > > > be a > > > > matter of setting dev->dma_coherent. That's basically what the DT > > > > code > > > > does. > > > > > > > > Some MIPS boards set it in their setup code for instance. > > > > > > > This is a *strategy*, not a *mechanism*. > > > > > > In this case, DT is just used to describing the hardware. > > > > > > (It is actually a hardware feature describing language, the > > > granularity > > > is large) > > > > > > It does not changing the state of the hardware. > > > > > > It's your platform firmware or kernel setting up code who actually do > > > such a things. > > > > > > > > > It's just that it works on *one* platform, it does not guarantee it > > > will > > > works on others. > > If you add the "dma-coherent" property in a device node in DT, you > > effectively specify that the device is DMA-coherent; so you describe > > the hardware, which is what DT is for, and you are not changing the > > state of the hardware. > > > > Note that some MIPS platforms (arch/mips/alchemy/common/setup.c) > > default to DMA-coherent mapping; I believe you could do something > > similar with your Loongson LS3A4000 CPU and LS2K2000/LS2K1000 SoC. > > > The preblem is that device driver can have various demand. > > It probably want to create different kind of buffers for different thing > simultaneously. > > Say, one allocated with dma_alloc_coherent for command buffer or dma > descriptor > > another one allocated with dma_alloc_wc for uploading shader etc. > > also has the third one allocated with dma_alloc_noncoherent() for doing some > else. And it will work just fine. struct device dma_coherent, or DT's dma-coherent property define that the device doesn't need any kind of cache maintenance, ever. If it's missing, we need to perform cache maintenance to keep coherency. dma_alloc_* functions provide guarantees to the driver. With dma_alloc_wc and dma_alloc_coherent, the buffer is coherent, and thus you don't need to perform cache maintenance operations by hand in the driver. With dma_alloc_noncoherent, the buffer is non-coherent and the driver needs to perform them when relevant. How those buffers are created is platform specific, but the guarantees provided *to the driver* are always there. A buffer allocated with dma_alloc_coherent might be provided by different means (at the hardware level with a coherency unit, by mapping it non-cacheable), but as far as the driver is concerned it's always going to be coherent. Similarly, a driver using dma_alloc_noncoherent will always require cache maintenance operations to use the API properly, even if the hardware provides coherency (in which case, those operations will be nop). So, yeah, like I was saying in the other mail, it looks like you're confusing a bunch of things. dma_alloc_* functions are about the driver expectations and guarantees. DT's dma-coherent property is about how we can implement them on a given platform. They don't have to match, and that's precisely how we can have drivers that run on any combination of platforms: the driver only cares about the buffer guarantees, the platform description takes care of how they are implemented. Maxime
Attachment:
signature.asc
Description: PGP signature