Hi Sui, Le mercredi 07 juin 2023 à 18:30 +0800, Sui Jingfeng a écrit : > Hi, > > > On 2023/6/7 17:36, Paul Cercueil wrote: > > Hi Sui, > > > > Le mercredi 07 juin 2023 à 13:30 +0800, Sui Jingfeng a écrit : > > > The single map_noncoherent member of struct drm_gem_dma_object > > > may > > > not > > > sufficient for describing the backing memory of the GEM buffer > > > object. > > > > > > Especially on dma-coherent systems, the backing memory is both > > > cached > > > coherent for multi-core CPUs and dma-coherent for peripheral > > > device. > > > Say architectures like X86-64, LoongArch64, Loongson Mips64, etc. > > > > > > Whether a peripheral device is dma-coherent or not can be > > > implementation-dependent. The single map_noncoherent option is > > > not > > > enough > > > to reflect real hardware anymore. For example, the Loongson > > > LS3A4000 > > > CPU > > > and LS2K2000/LS2K1000 SoC, peripheral device of such hardware > > > platform > > > allways snoop CPU's cache. Doing the allocation with > > > dma_alloc_coherent > > > function is preferred. The return buffer is cached, it should not > > > using > > > the default write-combine mapping. While with the current > > > implement, > > > there > > > no way to tell the drm core to reflect this. > > > > > > This patch adds cached and coherent members to struct > > > drm_gem_dma_object. > > > which allow driver implements to inform the core. Introducing new > > > mappings > > > while keeping the original default behavior unchanged. > > Did you try to simply set the "dma-coherent" property to the > > device's > > node? > > But this approach can only be applied for the device driver with DT > support. > > X86-64, Loongson ls3a4000 mips64, Loongson ls3a5000 CPU typically do > not > have DT support. > > They using ACPI to pass parameter from the firmware to Linux kernel. > > You approach will lost the effectiveness on such a case. Well, I don't really know how ACPI handles it - but it should just be a matter of setting dev->dma_coherent. That's basically what the DT code does. Some MIPS boards set it in their setup code for instance. > > From what I understand if you add that property then Linux will > > use DMA > > coherent memory even though you use dma_alloc_noncoherent() and the > > sync_single_for_cpu() / sync_single_for_device() are then NOPs. > > Please do not mitigate the problems with confusing method. > > > This approach not only tend to generate confusion but also > implement-dependent > > and arch-dependent. It's definitely problematic. > > > How does the dma_alloc_coherent/dma_alloc_noncoherent is a ARCH > specific > thing. > > Dependent on how does the arch_dma_ops is implemented. > > > The definition of the coherent on different ARCH has different > meanings. > > The definition of the wirte-combine on different ARCH has different > meanings. > > > The wirte-combine(uncache acceleration) on mips is non dma-coherent. It is dma-coherent on Ingenic SoCs. > > But on arm, It seem that wirte-combine is coherent. (guaranteed by > arch > implement). > > > I also heard using dma_alloc_coherent to allocation the buffer for > the > non-coherent doesn't hurt, but the reverse is not true. > > > But please do not create confusion. > > software composite is faster because better cacheusing rate and > > cache is faster to read. > > It is faster because it is cached, not because it is non-coherent. > > non-coherent is arch thing and/or driver-side thing, > > it is a side effect of using the cached mapping. Yes, I know that. > > > It should left to driver to handle such a side effect. The device > driver > > know their device, so its the device driver's responsibility to > maintain > the coherency. On loongson platform, we don't need to call > drm_fb_dma_sync_non_coherent() function, Its already guaranteed by > hardware. I understand. What I'm saying, is that you should be able to set dma_obj->map_noncoherent (which would arguably be better named "map_cached", but that's a different problem). Then the GEM code would end up calling dma_alloc_noncoherent(), which will give you *cached* memory. Then as long as dev->dma_coherent = true, drm_fb_dma_sync_non_coherent() should be a NOP - so you wouldn't pointlessly sync/invalidate the caches. And I disagree with you, the driver shouldn't handle such things. The fact that it is better to use cached memory or uncached with write- combine really is platform-specific and not something that the driver should be aware of. Cheers, -Paul > > > > Cheers, > > -Paul > > > > > Signed-off-by: Sui Jingfeng <suijingfeng@xxxxxxxxxxx> > > > --- > > > drivers/gpu/drm/drm_fb_dma_helper.c | 11 +++++------ > > > drivers/gpu/drm/drm_fbdev_dma.c | 2 +- > > > drivers/gpu/drm/drm_gem_dma_helper.c | 20 > > > ++++++++++++++++---- > > > drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 5 ++++- > > > drivers/gpu/drm/rcar-du/Kconfig | 2 -- > > > drivers/gpu/drm/rcar-du/rcar_du_kms.c | 4 +++- > > > include/drm/drm_gem_dma_helper.h | 7 +++++-- > > > 7 files changed, 34 insertions(+), 17 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/drm_fb_dma_helper.c > > > b/drivers/gpu/drm/drm_fb_dma_helper.c > > > index 3b535ad1b07c..93ff05041192 100644 > > > --- a/drivers/gpu/drm/drm_fb_dma_helper.c > > > +++ b/drivers/gpu/drm/drm_fb_dma_helper.c > > > @@ -106,16 +106,15 @@ dma_addr_t drm_fb_dma_get_gem_addr(struct > > > drm_framebuffer *fb, > > > EXPORT_SYMBOL_GPL(drm_fb_dma_get_gem_addr); > > > > > > /** > > > - * drm_fb_dma_sync_non_coherent - Sync GEM object to non- > > > coherent > > > backing > > > - * memory > > > + * drm_fb_dma_sync_non_coherent - Sync GEM object to cached > > > backing > > > memory > > > * @drm: DRM device > > > * @old_state: Old plane state > > > * @state: New plane state > > > * > > > * This function can be used by drivers that use damage clips > > > and > > > have > > > - * DMA GEM objects backed by non-coherent memory. Calling this > > > function > > > - * in a plane's .atomic_update ensures that all the data in the > > > backing > > > - * memory have been written to RAM. > > > + * DMA GEM objects backed by cached memory. Calling this > > > function in > > > a > > > + * plane's .atomic_update ensures that all the data in the > > > backing > > > memory > > > + * have been written to RAM. > > > */ > > > void drm_fb_dma_sync_non_coherent(struct drm_device *drm, > > > struct drm_plane_state > > > *old_state, > > > @@ -131,7 +130,7 @@ void drm_fb_dma_sync_non_coherent(struct > > > drm_device *drm, > > > > > > for (i = 0; i < finfo->num_planes; i++) { > > > dma_obj = drm_fb_dma_get_gem_obj(state->fb, i); > > > - if (!dma_obj->map_noncoherent) > > > + if (dma_obj->cached && dma_obj->coherent) > > > continue; > > > > > > daddr = drm_fb_dma_get_gem_addr(state->fb, > > > state, i); > > > diff --git a/drivers/gpu/drm/drm_fbdev_dma.c > > > b/drivers/gpu/drm/drm_fbdev_dma.c > > > index d86773fa8ab0..49fe9b284cc8 100644 > > > --- a/drivers/gpu/drm/drm_fbdev_dma.c > > > +++ b/drivers/gpu/drm/drm_fbdev_dma.c > > > @@ -131,7 +131,7 @@ static int > > > drm_fbdev_dma_helper_fb_probe(struct > > > drm_fb_helper *fb_helper, > > > > > > /* screen */ > > > info->flags |= FBINFO_VIRTFB; /* system memory */ > > > - if (dma_obj->map_noncoherent) > > > + if (dma_obj->cached) > > > info->flags |= FBINFO_READS_FAST; /* signal > > > caching > > > */ > > > info->screen_size = sizes->surface_height * fb- > > > >pitches[0]; > > > info->screen_buffer = map.vaddr; > > > diff --git a/drivers/gpu/drm/drm_gem_dma_helper.c > > > b/drivers/gpu/drm/drm_gem_dma_helper.c > > > index 870b90b78bc4..dec1d512bdf1 100644 > > > --- a/drivers/gpu/drm/drm_gem_dma_helper.c > > > +++ b/drivers/gpu/drm/drm_gem_dma_helper.c > > > @@ -93,7 +93,11 @@ __drm_gem_dma_create(struct drm_device *drm, > > > size_t size, bool private) > > > drm_gem_private_object_init(drm, gem_obj, size); > > > > > > /* Always use writecombine for dma-buf mappings > > > */ > > > - dma_obj->map_noncoherent = false; > > > + /* FIXME: This is not always true, on some dma > > > coherent system, > > > + * cached mappings should be preferred over > > > writecombine > > > + */ > > > + dma_obj->cached = false; > > > + dma_obj->coherent = false; > > > } else { > > > ret = drm_gem_object_init(drm, gem_obj, size); > > > } > > > @@ -143,7 +147,11 @@ struct drm_gem_dma_object > > > *drm_gem_dma_create(struct drm_device *drm, > > > if (IS_ERR(dma_obj)) > > > return dma_obj; > > > > > > - if (dma_obj->map_noncoherent) { > > > + if (dma_obj->cached && dma_obj->coherent) { > > > + dma_obj->vaddr = dma_alloc_coherent(drm->dev, > > > size, > > > + &dma_obj- > > > > dma_addr, > > > + GFP_KERNEL | > > > __GFP_NOWARN); > > > + } else if (dma_obj->cached && !dma_obj->coherent) { > > > dma_obj->vaddr = dma_alloc_noncoherent(drm->dev, > > > size, > > > &dma_obj- > > > > dma_addr, > > > > > > DMA_TO_DEVICE, > > > @@ -153,6 +161,7 @@ struct drm_gem_dma_object > > > *drm_gem_dma_create(struct drm_device *drm, > > > &dma_obj- > > > >dma_addr, > > > GFP_KERNEL | > > > __GFP_NOWARN); > > > } > > > + > > > if (!dma_obj->vaddr) { > > > drm_dbg(drm, "failed to allocate buffer with > > > size > > > %zu\n", > > > size); > > > @@ -233,7 +242,10 @@ void drm_gem_dma_free(struct > > > drm_gem_dma_object > > > *dma_obj) > > > dma_buf_vunmap_unlocked(gem_obj- > > > > import_attach->dmabuf, &map); > > > drm_prime_gem_destroy(gem_obj, dma_obj->sgt); > > > } else if (dma_obj->vaddr) { > > > - if (dma_obj->map_noncoherent) > > > + if (dma_obj->cached && dma_obj->coherent) > > > + dma_free_coherent(gem_obj->dev->dev, > > > dma_obj- > > > > base.size, > > > + dma_obj->vaddr, > > > dma_obj- > > > > dma_addr); > > > + else if (dma_obj->cached && !dma_obj->coherent) > > > dma_free_noncoherent(gem_obj->dev->dev, > > > dma_obj->base.size, > > > dma_obj->vaddr, > > > dma_obj- > > > > dma_addr, > > > DMA_TO_DEVICE); > > > @@ -532,7 +544,7 @@ int drm_gem_dma_mmap(struct > > > drm_gem_dma_object > > > *dma_obj, struct vm_area_struct * > > > vma->vm_pgoff -= drm_vma_node_start(&obj->vma_node); > > > vm_flags_mod(vma, VM_DONTEXPAND, VM_PFNMAP); > > > > > > - if (dma_obj->map_noncoherent) { > > > + if (dma_obj->cached) { > > > vma->vm_page_prot = vm_get_page_prot(vma- > > > >vm_flags); > > > > > > ret = dma_mmap_pages(dma_obj->base.dev->dev, > > > diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c > > > b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c > > > index 5ec75e9ba499..a3df2f99a757 100644 > > > --- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c > > > +++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c > > > @@ -919,7 +919,10 @@ ingenic_drm_gem_create_object(struct > > > drm_device > > > *drm, size_t size) > > > if (!obj) > > > return ERR_PTR(-ENOMEM); > > > > > > - obj->map_noncoherent = priv->soc_info->map_noncoherent; > > > + if (priv->soc_info->map_noncoherent) { > > > + obj->cached = true; > > > + obj->coherent = false; > > > + } > > > > > > return &obj->base; > > > } > > > diff --git a/drivers/gpu/drm/rcar-du/Kconfig > > > b/drivers/gpu/drm/rcar- > > > du/Kconfig > > > index 53c356aed5d5..dddc70c08bdc 100644 > > > --- a/drivers/gpu/drm/rcar-du/Kconfig > > > +++ b/drivers/gpu/drm/rcar-du/Kconfig > > > @@ -2,8 +2,6 @@ > > > config DRM_RCAR_DU > > > tristate "DRM Support for R-Car Display Unit" > > > depends on DRM && OF > > > - depends on ARM || ARM64 > > > - depends on ARCH_RENESAS || COMPILE_TEST > > > select DRM_KMS_HELPER > > > select DRM_GEM_DMA_HELPER > > > select VIDEOMODE_HELPERS > > > diff --git a/drivers/gpu/drm/rcar-du/rcar_du_kms.c > > > b/drivers/gpu/drm/rcar-du/rcar_du_kms.c > > > index adfb36b0e815..1142d51473e6 100644 > > > --- a/drivers/gpu/drm/rcar-du/rcar_du_kms.c > > > +++ b/drivers/gpu/drm/rcar-du/rcar_du_kms.c > > > @@ -386,7 +386,9 @@ struct drm_gem_object > > > *rcar_du_gem_prime_import_sg_table(struct drm_device *dev, > > > gem_obj->funcs = &rcar_du_gem_funcs; > > > > > > drm_gem_private_object_init(dev, gem_obj, attach- > > > >dmabuf- > > > > size); > > > - dma_obj->map_noncoherent = false; > > > + > > > + dma_obj->cached = false; > > > + dma_obj->coherent = false; > > > > > > ret = drm_gem_create_mmap_offset(gem_obj); > > > if (ret) { > > > diff --git a/include/drm/drm_gem_dma_helper.h > > > b/include/drm/drm_gem_dma_helper.h > > > index 8a043235dad8..585ce3d4d1eb 100644 > > > --- a/include/drm/drm_gem_dma_helper.h > > > +++ b/include/drm/drm_gem_dma_helper.h > > > @@ -16,7 +16,9 @@ struct drm_mode_create_dumb; > > > * more than one entry but they are guaranteed to have > > > contiguous > > > * DMA addresses. > > > * @vaddr: kernel virtual address of the backing memory > > > - * @map_noncoherent: if true, the GEM object is backed by non- > > > coherent memory > > > + * @cached: if true, the GEM object is backed by cached memory > > > + * @coherent: This option only meaningful when a GEM object is > > > cached. > > > + * If true, Sync the GEM object for DMA access is not > > > required. > > > */ > > > struct drm_gem_dma_object { > > > struct drm_gem_object base; > > > @@ -26,7 +28,8 @@ struct drm_gem_dma_object { > > > /* For objects with DMA memory allocated by GEM DMA */ > > > void *vaddr; > > > > > > - bool map_noncoherent; > > > + bool cached; > > > + bool coherent; > > > }; > > > > > > #define to_drm_gem_dma_obj(gem_obj) \ >