Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device

Sui Jingfeng <suijingfeng@xxxxxxxxxxx> · Sun, 25 Jun 2023 12:04:13 +0800

Hi,

On 2023/6/22 01:53, Lucas Stach wrote:
Am Donnerstag, dem 22.06.2023 um 01:31 +0800 schrieb Sui Jingfeng:
Hi,

On 2023/6/22 00:07, Lucas Stach wrote:
And as the HW guarantees it on your platform, your platform
implementation makes this function effectively a no-op. Skipping the
call to this function is breaking the DMA API abstraction, as now the
driver is second guessing the DMA API implementation. I really see no
reason to do this.
It is the same reason you chose the word 'effectively', not 'difinitely'.

We don't want waste the CPU's time,

   to running the dma_sync_sg_for_cpu funcion() function

```

void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
              int nelems, enum dma_data_direction dir)
{
      const struct dma_map_ops *ops = get_dma_ops(dev);

      BUG_ON(!valid_dma_direction(dir));
      if (dma_map_direct(dev, ops))
          dma_direct_sync_sg_for_cpu(dev, sg, nelems, dir);
      else if (ops->sync_sg_for_cpu)
          ops->sync_sg_for_cpu(dev, sg, nelems, dir);
      debug_dma_sync_sg_for_cpu(dev, sg, nelems, dir);
}

```

   to running the this:

```

int etnaviv_gem_cpu_fini(struct drm_gem_object *obj)
{
      struct drm_device *dev = obj->dev;
      struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
      struct etnaviv_drm_private *priv = dev->dev_private;

      if (!priv->dma_coherent && etnaviv_obj->flags & ETNA_BO_CACHED) {
          /* fini without a prep is almost certainly a userspace error */
          WARN_ON(etnaviv_obj->last_cpu_prep_op == 0);
          dma_sync_sgtable_for_device(dev->dev, etnaviv_obj->sgt,
etnaviv_op_to_dma_dir(etnaviv_obj->last_cpu_prep_op));
          etnaviv_obj->last_cpu_prep_op = 0;
      }

      return 0;
}

```

My judgment as the maintainer of this driver is that the small CPU
overhead of calling this function is very well worth it, if the
alternative is breaking the DMA API abstractions.

But, this is acceptable, because we can kill the GEM_CPU_PREP and
GEM_CPU_FINI ioctl entirely

at userspace for cached buffer, as this is totally not needed for cached
mapping on our platform.

And that statement isn't true either.

Yes, you are right here. I admit.

Because I have suffered such problem in the past when developing 
xf86-video-loongson.

The root cause, I think,  is the CPU don't know when the GPU have 
finished the rendering.

Or there still some data reside in the GPU's cache.

We have to call etna_bo_cpu_prep(etna_bo, DRM_ETNA_PREP_READ) function

to make sure the  data fetch by CPU is the latest.

I realized this knowledge(issue) five month ago in this year, see [1] 
for reference.

I  just forget this thing when doing the debate with you.

[1] 
https://gitlab.freedesktop.org/longxin2019/xf86-video-loongson/-/commit/95f9596eb19223c3109ea1f32c3e086fd1d43bd8

||

  The CPU_PREP/FINI ioctls also
provide fence synchronization between CPU and GPU.

You are correct here.

There are a few very
specific cases where skipping those ioctls is acceptable (mostly when
the userspace driver explicitly wants unsynchronized access), but in
most cases they are required for correctness.

OK, you are extremely correct.

Regards,
Lucas

--
Jingfeng