On Thu, Mar 15, 2018 at 10:56 AM, Christian König <ckoenig.leichtzumerken@xxxxxxxxx> wrote: > Am 15.03.2018 um 10:20 schrieb Daniel Vetter: >> >> On Tue, Mar 13, 2018 at 06:20:07PM +0100, Christian König wrote: >> [SNIP] >> Take a look at the DOT graphs for atomic I've done a while ago. I think we >> could make a formidable competition for who's doing the worst diagrams :-) > > > Thanks, going to give that a try. > >> [SNIP] >> amdgpu: Expects that you never hold any of the heavywheight locks while >> waiting for a fence (since gpu resets will need them). >> >> i915: Happily blocks on fences while holding all kinds of locks, expects >> gpu reset to be able to recover even in this case. > > > In this case I can comfort you, the looks amdgpu needs to grab during GPU > reset are the reservation lock of the VM page tables. I have strong doubt > that i915 will ever hold those. Ah good, means that very likely there's at least no huge fundamental design issue that we run into. > Could be that we run into problems because Thread A hold lock 1 tries to > take lock 2, then i915 holds 2 and our reset path needs 1. Yeah that might happen, but lockdep will catch those, and generally those cases can be fixed with slight reordering or re-annotating of the code to avoid upsetting lockdep. As long as we don't have a full-on functional dependency (which is what I've feared). >> [SNIP] >>> >>> Yes, except for fallback paths and bootup self tests we simply never wait >>> for fences while holding locks. >> >> That's not what I meant with "are you sure". Did you enable the >> cross-release stuff (after patching the bunch of leftover core kernel >> issues still present), annotate dma_fence with the cross-release stuff, >> run a bunch of multi-driver (amdgpu vs i915) dma-buf sharing tests and >> weep? > > > Ok, what exactly do you mean with cross-release checking? Current lockdep doesn't spot deadlocks like the below: thread A: holds mutex, waiting for completion. thread B: acquires mutex before it will ever signal the completion A is waiting for ->deadlock cross-release lockdep support can catch these through new fancy annotations. Similar waiter/signaller annotations exists for waiting on workers and anything else, and it would be a perfect fit for waiter/signaller code around dma_fence. lwn has you covered a usual: https://lwn.net/Articles/709849/ Cheers, Daniel >> I didn't do the full thing yet, but just within i915 we've found tons of >> small little deadlocks we never really considered thanks to cross release, >> and that wasn't even including the dma_fence annotation. Luckily nothing >> that needed a full-on driver redesign. >> >> I guess I need to ping core kernel maintainers about cross-release again. >> I'd much prefer if we could validate ->invalidate_mapping and the >> locking/fence dependency issues using that, instead of me having to read >> and understand all the drivers. > > [SNIP] >> >> I fear that with the ->invalidate_mapping callback (which inverts the >> control flow between importer and exporter) and tying dma_fences into all >> this it will be a _lot_ worse. And I'm definitely too stupid to understand >> all the dependency chains without the aid of lockdep and a full test suite >> (we have a bunch of amdgpu/i915 dma-buf tests in igt btw). > > > Yes, that is also something I worry about. > > Regards, > Christian. -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch