Re: dma-resv ongoing discussion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dave and of course everybody else,

Am 24.05.21 um 04:03 schrieb Dave Airlie:
I'd like to try and summarise where I feel we are all at with respect
to the dma-buf discussions. I think I've gotten a fairly good idea of
how things stand but I'm not sure we are really getting to the how to
move things forward stage, where is probably when I need to step in.
Thanks for keeping this as respectful as it has been I understand it
can be difficult. I also think we are starting to find we moved the
knob on driver development happening in company siloes too far with
acceleration features and hopefully with this and TTM work etc we can
start to push back to upstream first designs.

I think Jason[1] summed up my feelings on this the best. We have a
dma-buf inter-driver contract that has a design issue. We didn't fix
that initially, now we have amdgpu as the outlier in a world where
everyone else agreed to the contract.

a) Christian wants to try and move forward with fixing the world of
dma-buf design across all drivers, but hasn't come up with a plan for
doing so apart from amdgpu/i915. I think one strength Daniel has here
is that he's good at coming up with plans that change the ecosystem.
I'd really like to see some concrete effort to work out how much work
fixing this across the ecosystem is and whether it is possible. I
expect Daniel's big huge monster commit message summary of the current
drivers is a great place to start for this. That is if we can agree
dma-buf is broken and what dma-buf should look like tomorrow.

Well to clarify I don't want to move forward to implement new features, but rather to fix existing shortcomings.

From my point of view the main purpose of the dma_resv object is to provide a container for dma_fence objects for different use cases.

Those use cases are then.
1. Resource management.
2. Implicit synchronization.
3. Information about current operations.

Now I think I can summarize the problem I'm seeing in that the focus of the design is to much towards towards a single use case here.

For example, for resource management alone I need to be able to add any fence at any time to the resv object without any restriction.

b) Daniel is coming from the side of let's bring amdgpu into the fold
first, then if the problem exists we can move everything forward
together. He intends on pointing out how alone amdgpu is here, and
wants to try and create a uapi that at least mitigates the biggest
problems with moving amdgpu to the common model first. I'd like to
know if this is at least a possibility as an alternate route. I
understand AMD have some goals to reach here but I think we've dug a
massive hole here and paying off the tech debt is going to have to
delay those goals if we are to keep upstream sane.

I don't think we can do this so easily without breaking uAPI.

Userspace in the form of both RADV as well as AMDVLK depend on that behavior and we still have the original video decode use case this was invented for.

I'm slowly paging all of the technical details as I go, I'd like to
see more thought around Daniel's idea of fixing the amdgpu oversync
with TLB flushing, as it really doesn't make much sense to be that TLB
flushing on process teardown is going to stall out other processes
using the shared buffer, that it should only stall out moving the
pages. If that then allows aligning amdgpu for now and we can work out
how to fix (a) then that would rock.

Well this is exactly what I've been trying to do by adding those flags to the shared fences, but Daniel already convinced me that this is to invasive as a first step.

And while this over synchronization is annoying it's already there for a very long time and only affects the case when the BO is shared between devices.

So for the moment I'm pondering on the question what would be the absolutely minimum change necessary to get amdgpu to use the exclusive fence in the same way other drivers do.

And I think I can summarize this into two things:
1. We make it possible to add shared fences which are not synchronized to the explicit fence. 2. We make it possible to replace the explicit fence without removing all the shared fences.

With that in place I'm able to change amdgpu so that we can fill in the exclusive fence during CS with chain nodes and keep the synchronization model for existing amdgpu uAPI the same.

Regards,
Christian.

Please correct me where I'm wrong here and definitely if I've
misrepresented anyone's positions.

Dave.


[1] https://lore.kernel.org/dri-devel/a1925038-5c3c-0193-1870-27488caa2577@xxxxxxxxx/T/#md800f00476ca1869a81b02a28cb2fabc1028c6be




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux