Re: handle exclusive fence similar to shared ones

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 07.06.21 um 10:58 schrieb Daniel Vetter:
Hi Christian,

So unfortunately I got distracted with some i915 bugs and fun last
week completely, so didn't get around to it.

On Sun, Jun 6, 2021 at 12:03 PM Christian König
<ckoenig.leichtzumerken@xxxxxxxxx> wrote:
Hi Daniel,

as discussed here are the patches which change the handle around exclusive fence handling.

The main problem seems to have been the dma_resv_test_signaled() function which ignored the exclusive fence when shared fences where present. This was already rather inconsistent since dma_fence_wait_timeout() takes the exclusive one into account even if shared ones are present.

The second patch then fixes nouveu to also always take the exclusive fence into account.

The third then removes the workaround in amdgpu around the VM page table clearing handling. Since I'm not sure if there are no other places which relies on the existing behavior I will hold this one back for a while.

Is that what you had in mind as well?
I think from the semantics something where we treat the exclusive
fence as an IPC mechanism that the kernel doesn't care much about
(exceptions apply), and but more consistently count all access from
any CS as a shared fence. So in a way what you've done here, and also
what you've done in the earlier series with setting the read/write
flags on shared fences.

Yeah, I think that this will work for me as well.

For actual approach what I've picked is a bit of what amdgpu does +
what other drivers do with NO_IMPLICIT, but with the bugs fixed
(there's a bunch of them): Essentially we try to always set the shared
fences, and exclusive fences are set additionally on top when the
implicit sync IPC calls for that. And on the depdendency side we do
clever logic to only take in the exclusive fence when required.
Currently for amdgpu this means introspecting the fence owner (there's
some nasty tricks there I think to do to make this work and not be a
security bug), for others that's done with the NO_IMPLICIT flag (but
again some nasty corners there, which I think a bunch of drivers get
wrong).

For amdgpu I have been pondering on the following idea  last week to make it behave the same as the other drivers:

1. We allow setting the explicit fence without touching the shared fences.
    As far as I understand it this is also part of your idea above.

2. During command submission amdgpu uses a dma_fence_chain node to chain together the new CS with the existing explicit sync.

3. During command synchronization amdgpu takes a look at the explicit fence and walks the dma_fence_chain history.     Submissions from the same process (the owner) are not synced to (e.g. same behavior as of today), but as soon as we see something which doesn't fit into the amdgpu CS model we sync to the remaining chain.

That would give us both keeping the current amdgpu CS behavior (which we then can extend) as well as setting the explicit fence according to the DMA-buf rules.

There's two reasons I'm more leaning in that direction:
- The annoying thing is that the audit on the dependency side is a lot
trickier since everyone rolls their own dependency handling.

Yes, absolutely agree. That's why I said we need to have use case based functionality here.

In other words what we need is something like an dma_resv_for_each_sync_fence(for_write) macro.

E.g. drivers then only do something like:

dma_resv_for_each_sync_fence(resv, for_write, fence)
    driver_specific_syncing_to_fence(fence);

And not every driver calling the underlying functions on it's own and then doing whatever it pleases.

If we don't change (for now at least) the rules around dma_resv then an
oversight in the audit isn't going to be a huge problem.
- Wording becomes inconsistent: An exclusive fence which is also a
shared is a bit confusing. I think it's better if we stick to the
current rules for dma_resv, change the semantics we want in drivers (I
think that's doable, at maybe some code cost e.g. Jason's import ioctl
would be simpler with your changed rules, but still doable with the
current dma_resv rules). And then when we have that, we figure out
what to change with the dma_resv struct/rules.

But then at least do the minimal change so that we can get amdgpu in line with all other drivers like I outlined above.

We can keep that as a hack in amdgpu if that makes you feel better. Chaining the exclusive fence together is roughly 4 times slower than the shared approach, but I think that this is negligible compared to all the other stuff we do.

Regards,
Christian.

Wrt the patches: Good thing is that what you change here and what I've
found thus far is 100% not overlapping, so at least we didn't waste
time auditing the same code :-)

Cheers, Daniel
Regards,
Christian.







[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux