Re: dma-resv ongoing discussion

Christian König <ckoenig.leichtzumerken@xxxxxxxxx> · Tue, 25 May 2021 15:18:13 +0200

Hi Dave and of course everybody else,

Am 24.05.21 um 04:03 schrieb Dave Airlie:
I'd like to try and summarise where I feel we are all at with respect
to the dma-buf discussions. I think I've gotten a fairly good idea of
how things stand but I'm not sure we are really getting to the how to
move things forward stage, where is probably when I need to step in.
Thanks for keeping this as respectful as it has been I understand it
can be difficult. I also think we are starting to find we moved the
knob on driver development happening in company siloes too far with
acceleration features and hopefully with this and TTM work etc we can
start to push back to upstream first designs.

I think Jason[1] summed up my feelings on this the best. We have a
dma-buf inter-driver contract that has a design issue. We didn't fix
that initially, now we have amdgpu as the outlier in a world where
everyone else agreed to the contract.

a) Christian wants to try and move forward with fixing the world of
dma-buf design across all drivers, but hasn't come up with a plan for
doing so apart from amdgpu/i915. I think one strength Daniel has here
is that he's good at coming up with plans that change the ecosystem.
I'd really like to see some concrete effort to work out how much work
fixing this across the ecosystem is and whether it is possible. I
expect Daniel's big huge monster commit message summary of the current
drivers is a great place to start for this. That is if we can agree
dma-buf is broken and what dma-buf should look like tomorrow.

Well to clarify I don't want to move forward to implement new features, 
but rather to fix existing shortcomings.

From my point of view the main purpose of the dma_resv object is to 
provide a container for dma_fence objects for different use cases.

Those use cases are then.
1. Resource management.
2. Implicit synchronization.
3. Information about current operations.

Now I think I can summarize the problem I'm seeing in that the focus of 
the design is to much towards towards a single use case here.

For example, for resource management alone I need to be able to add any 
fence at any time to the resv object without any restriction.

b) Daniel is coming from the side of let's bring amdgpu into the fold
first, then if the problem exists we can move everything forward
together. He intends on pointing out how alone amdgpu is here, and
wants to try and create a uapi that at least mitigates the biggest
problems with moving amdgpu to the common model first. I'd like to
know if this is at least a possibility as an alternate route. I
understand AMD have some goals to reach here but I think we've dug a
massive hole here and paying off the tech debt is going to have to
delay those goals if we are to keep upstream sane.

I don't think we can do this so easily without breaking uAPI.

Userspace in the form of both RADV as well as AMDVLK depend on that 
behavior and we still have the original video decode use case this was 
invented for.

I'm slowly paging all of the technical details as I go, I'd like to
see more thought around Daniel's idea of fixing the amdgpu oversync
with TLB flushing, as it really doesn't make much sense to be that TLB
flushing on process teardown is going to stall out other processes
using the shared buffer, that it should only stall out moving the
pages. If that then allows aligning amdgpu for now and we can work out
how to fix (a) then that would rock.

Well this is exactly what I've been trying to do by adding those flags 
to the shared fences, but Daniel already convinced me that this is to 
invasive as a first step.

And while this over synchronization is annoying it's already there for a 
very long time and only affects the case when the BO is shared between 
devices.

So for the moment I'm pondering on the question what would be the 
absolutely minimum change necessary to get amdgpu to use the exclusive 
fence in the same way other drivers do.

And I think I can summarize this into two things:
1. We make it possible to add shared fences which are not synchronized 
to the explicit fence.
2. We make it possible to replace the explicit fence without removing 
all the shared fences.

With that in place I'm able to change amdgpu so that we can fill in the 
exclusive fence during CS with chain nodes and keep the synchronization 
model for existing amdgpu uAPI the same.

Regards,
Christian.

Please correct me where I'm wrong here and definitely if I've
misrepresented anyone's positions.

Dave.

[1] https://lore.kernel.org/dri-devel/a1925038-5c3c-0193-1870-27488caa2577@xxxxxxxxx/T/#md800f00476ca1869a81b02a28cb2fabc1028c6be