Re: [Mesa-dev] [PATCH 0/6] dma-buf: Add an API for exporting sync files (v12)

Christian König <ckoenig.leichtzumerken@xxxxxxxxx> · Mon, 21 Jun 2021 12:16:55 +0200

Am 18.06.21 um 20:45 schrieb Daniel Vetter:
On Fri, Jun 18, 2021 at 8:02 PM Christian König
<christian.koenig@xxxxxxx> wrote:
Am 18.06.21 um 19:20 schrieb Daniel Vetter:
[SNIP]
The whole thing was introduced with this commit here:

commit f2c24b83ae90292d315aa7ac029c6ce7929e01aa
Author: Maarten Lankhorst <maarten.lankhorst@xxxxxxxxxxxxx>
Date:   Wed Apr 2 17:14:48 2014 +0200

      drm/ttm: flip the switch, and convert to dma_fence

      Signed-off-by: Maarten Lankhorst <maarten.lankhorst@xxxxxxxxxxxxx>

   int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo,
....
-       bo->sync_obj = driver->sync_obj_ref(sync_obj);
+       reservation_object_add_excl_fence(bo->resv, fence);
          if (evict) {

Maarten replaced the bo->sync_obj reference with the dma_resv exclusive
fence.

This means that we need to apply the sync_obj semantic to all drivers
using a DMA-buf with its dma_resv object, otherwise you break imports
from TTM drivers.

Since then and up till now the exclusive fence must be waited on and
never replaced with anything which signals before the old fence.

Maarten and I think Thomas did that and I was always assuming that you
know about this design decision.
Surprisingly I do actually know this.

Still the commit you cite did _not_ change any of the rules around
dma_buf: Importers have _no_ obligation to obey the exclusive fence,
because the buffer is pinned. None of the work that Maarten has done
has fundamentally changed this contract in any way.

Well I now agree that the rules around dma_resv are different than I 
thought, but this change should have raised more eyebrows.

The problem is this completely broke interop with all drivers using TTM 
and I think might even explain some bug reports.

I re-introduced the moving fence by adding bo->moving a few years after 
the initial introduction of dma_resv, but that was just to work around 
performance problems introduced by using the exclusive fence for both 
use cases.

If amdgpu (or any other ttm based driver) hands back and sgt without
waiting for ttm_bo->moving or the exclusive fence first, then that's a
bug we need to fix in these drivers. But if ttm based drivers did get
this wrong, then they got this wrong both before and after the switch
over to using dma_resv - this bug would go back all the way to Dave's
introduction of drm_prime.c and support for that.

I'm not 100% sure, but I think before the switch to the dma_resv object 
drivers just waited for the BOs to become idle and that should have 
prevented this.

Anyway let's stop discussing history and move forward. Sending patches 
for all affected TTM driver with CC: stable tags in a minute.

The only thing which importers have to do is not wreak the DAG nature
of the dma_resv fences and drop dependencies. Currently there's a
handful of drivers which break this (introduced over the last few
years), and I have it somewhere on my todo list to audit&fix them all.

Please give that some priority.

Ignoring the moving fence is a information leak, but messing up the DAG 
gives you access to freed up memory.

The goal with extracting dma_resv from ttm was to make implicit sync
working and get rid of some terrible stalls on the userspace side.
Eventually it was also the goal to make truly dynamic buffer
reservation possible, but that took another 6 or so years to realize
with your work. And we had to make dynamic dma-buf very much opt-in,
because auditing all the users is very hard work and no one
volunteered. And for dynamic dma-buf the rule is that the exclusive
fence must _never_ be ignored, and the two drivers supporting it (mlx5
and amdgpu) obey that.

So yeah for ttm drivers dma_resv is primarily for memory management,
with a side effect of also supporting implicit sync.

For everyone else (and this includes a pile of render drivers, all the
atomic kms drivers, v4l and I have no idea what else on top) dma_resv
was only ever about implicit sync, and it can be ignored. And it (the
implicit sync side) has to be ignored to be able to support vulkan
winsys buffers correctly without stalling where we shouldn't. Also we
have to ignore it on atomic kms side too (and depending upon whether
writeback is supported atomic kms is perfectly capable of reading out
any buffer passed to it).

Oh! That might actually explain some issues, but that just completely 
breaks when TTM based drivers use atomic.

In other words for the first use is actually rather likely for TTM based 
drivers to need to move the buffer around so that scanout is possible.

And that in turn means you need to wait for this move to finish even if 
you have an explicit fence to wait for. IIRC amdgpu rolled its own 
implementation of this and radeon doesn't have atomic, but nouveau is 
most like broken.

So we do need a better solution for this sooner or later.

It's absolutely not that this is my invention, I'm just telling you how
it ever was.

Anyway this means we have a seriously misunderstanding and yes now some
of our discussions about dynamic P2P suddenly make much more sense.
Yeah I think at least we finally managed to get this across.

Anyway I guess w/e for me now, otherwise we'll probably resort to
throwing chairs :-) I'm dearly hoping the thunderstorms all around me
actually get all the way to me, because it's way, way too hot here
right now.

Well it's probably rather Dave or Linus who might start to throw chairs 
at us to not getting this straight sooner.

At least the weather is getting more durable.

Cheers,
Christian.

Cheers, Daniel

Regards,
Christian.

_only_ when you have a dynamic importer/exporter can you assume that
the dma_resv fences must actually be obeyed. That's one of the reasons
why we had to make this a completely new mode (the other one was
locking, but they really tie together).

Wrt your problems:
a) needs to be fixed in drivers exporting buffers and failing to make
sure the memory is there by the time dma_buf_map_attachment returns.
b) needs to be fixed in the importers, and there's quite a few of
those. There's more than i915 here, which is why I think we should
have the dma_resv_add_shared_exclusive helper extracted from amdgpu.
Avoids hand-rolling this about 5 times (6 if we include the import
ioctl from Jason).

Also I've like been trying to explain this ever since the entire
dynamic dma-buf thing started.
-Daniel