On Mon, Apr 24, 2017 at 02:03:25PM +0100, Tvrtko Ursulin wrote: > > On 19/04/2017 10:41, Chris Wilson wrote: > >Track the latest fence waited upon on each context, and only add a new > >asynchronous wait if the new fence is more recent than the recorded > >fence for that context. This requires us to filter out unordered > >timelines, which are noted by DMA_FENCE_NO_CONTEXT. However, in the > >absence of a universal identifier, we have to use our own > >i915->mm.unordered_timeline token. > > (._.), a bit later... @_@! > > What does this fixes and is the complexity worth it? It's a recovery of the optimisation that we used to have from the initial multiple engine semaphore synchronisation - that of avoiding repeating the same synchronisation barriers. In the current setup, the cost of repeat fence synchronisation is obfuscated, it just causes a tight loop between /<---------------------------------------------\ | ^ i915_sw_fence_complete -> i915_sw_fence_commit ->| and extra depth in the dependency trees, which is generally not observed in normal usage. When you know what you are looking for, the reduction of all those atomic ops from underneath hardirq is definitely worth it, even for fairly simply operations, and there tends to be repetition from all he buffers being tracked between requests (and clients). Using a seqno map avoids the cost of tracking fences (i.e. keeping old fences forever) and allows it to be kept on the timeline, rather than the request itself (a ht under the request can squash simple repeats, but using the timeline is more complete). 2 small routines to implement a compressed radixtree -- it's comparitively simple compared to having to accommodate RCU walkers! -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx