Am 15.03.2018 um 10:20 schrieb Daniel Vetter:
On Tue, Mar 13, 2018 at 06:20:07PM +0100, Christian König wrote:
[SNIP]
Take a look at the DOT graphs for atomic I've done a while ago. I think we
could make a formidable competition for who's doing the worst diagrams :-)
Thanks, going to give that a try.
[SNIP]
amdgpu: Expects that you never hold any of the heavywheight locks while
waiting for a fence (since gpu resets will need them).
i915: Happily blocks on fences while holding all kinds of locks, expects
gpu reset to be able to recover even in this case.
In this case I can comfort you, the looks amdgpu needs to grab during
GPU reset are the reservation lock of the VM page tables. I have strong
doubt that i915 will ever hold those.
Could be that we run into problems because Thread A hold lock 1 tries to
take lock 2, then i915 holds 2 and our reset path needs 1.
[SNIP]
Yes, except for fallback paths and bootup self tests we simply never wait
for fences while holding locks.
That's not what I meant with "are you sure". Did you enable the
cross-release stuff (after patching the bunch of leftover core kernel
issues still present), annotate dma_fence with the cross-release stuff,
run a bunch of multi-driver (amdgpu vs i915) dma-buf sharing tests and
weep?
Ok, what exactly do you mean with cross-release checking?
I didn't do the full thing yet, but just within i915 we've found tons of
small little deadlocks we never really considered thanks to cross release,
and that wasn't even including the dma_fence annotation. Luckily nothing
that needed a full-on driver redesign.
I guess I need to ping core kernel maintainers about cross-release again.
I'd much prefer if we could validate ->invalidate_mapping and the
locking/fence dependency issues using that, instead of me having to read
and understand all the drivers.
[SNIP]
I fear that with the ->invalidate_mapping callback (which inverts the
control flow between importer and exporter) and tying dma_fences into all
this it will be a _lot_ worse. And I'm definitely too stupid to understand
all the dependency chains without the aid of lockdep and a full test suite
(we have a bunch of amdgpu/i915 dma-buf tests in igt btw).
Yes, that is also something I worry about.
Regards,
Christian.