Am Mittwoch, dem 21.06.2023 um 23:54 +0800 schrieb Sui Jingfeng: > Hi, > > On 2023/6/21 23:33, Lucas Stach wrote: > > Am Mittwoch, dem 21.06.2023 um 23:00 +0800 schrieb Sui Jingfeng: > > > On 2023/6/21 18:00, Lucas Stach wrote: > > > > > static inline enum dma_data_direction etnaviv_op_to_dma_dir(u32 op) > > > > > @@ -369,6 +381,7 @@ int etnaviv_gem_cpu_prep(struct drm_gem_object *obj, u32 op, > > > > > { > > > > > struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj); > > > > > struct drm_device *dev = obj->dev; > > > > > + struct etnaviv_drm_private *priv = dev->dev_private; > > > > > bool write = !!(op & ETNA_PREP_WRITE); > > > > > int ret; > > > > > > > > > > @@ -395,7 +408,7 @@ int etnaviv_gem_cpu_prep(struct drm_gem_object *obj, u32 op, > > > > > return ret == 0 ? -ETIMEDOUT : ret; > > > > > } > > > > > > > > > > - if (etnaviv_obj->flags & ETNA_BO_CACHED) { > > > > > + if (!priv->dma_coherent && etnaviv_obj->flags & ETNA_BO_CACHED) { > > > > Why do you need this? Isn't dma_sync_sgtable_for_cpu a no-op on your > > > > platform when the device is coherent? > > > > > > > I need this to show that our hardware is truly dma-coherent! > > > > > > I have tested that the driver still works like a charm without adding > > > this code '!priv->dma_coherent'. > > > > > > > > > But I'm expressing the idea that a truly dma-coherent just device don't > > > need this. > > > > > > I don't care if it is a no-op. > > > > > > It is now, it may not in the future. > > And that's exactly the point. If it ever turns into something more than > > a no-op on your platform, then that's probably for a good reason and a > > driver should not assume that it knows better than the DMA API > > implementation what is or is not required on a specific platform to > > make DMA work. > > > > > Even it is, the overhead of function call itself still get involved. > > > > > cpu_prep/fini aren't total fast paths, you already synchronized with > > the GPU here, potentially waiting for jobs to finish, etc. If your > > platform no-ops this then the function call will be in the noise. > > > > > Also, we want to try flush the write buffer with the CPU manually. > > > > > > > > > Currently, we want the absolute correctness in the concept, > > > > > > not only the rendering results. > > And if you want absolute correctness then calling dma_sync_sgtable_* is > > the right thing to do, as it can do much more than just manage caches. > > For our hardware, cached mapping don't need calling dma_sync_sgtable_*. > > This is the the right thing to do. The hardware already guarantee it for > use. > And as the HW guarantees it on your platform, your platform implementation makes this function effectively a no-op. Skipping the call to this function is breaking the DMA API abstraction, as now the driver is second guessing the DMA API implementation. I really see no reason to do this. > > We may only want to call it for WC mapping BO, please don't tangle all > of this together. > > We simply want to do the right thing. > > > Right now it also provides SWIOTLB translation if needed. > > SWIOTLB introduce the bounce buffer, slower the performance. > > We don't need it. It should be avoid. Sure. If your platform doesn't need it, that's totally fine. But you can't guarantee that all platforms with coherent Vivante GPUs don't need this. If it isn't needed the DMA API implementation will skip it just fine at almost no cost, so the driver really shouldn't try to be more clever than the platform DMA API implementation. Regards, Lucas