On Tue, Nov 01, 2016 at 09:20:33AM +0000, Tvrtko Ursulin wrote: > > On 01/11/2016 09:06, Chris Wilson wrote: > >On Tue, Nov 01, 2016 at 08:54:09AM +0000, Tvrtko Ursulin wrote: > >> > >>On 01/11/2016 08:50, Chris Wilson wrote: > >>>On Tue, Nov 01, 2016 at 08:41:01AM +0000, Tvrtko Ursulin wrote: > >>>> > >>>>On 31/10/2016 10:26, Chris Wilson wrote: > >>>>>With full-ppgtt one of the main bottlenecks is the lookup of the VMA > >>>>>underneath the object. For execbuf there is merit in having a very fast > >>>>>direct lookup of ctx:handle to the vma using a hashtree, but that still > >>>>>leaves a large number of other lookups. One way to speed up the lookup > >>>>>would be to use a rhashtable, but that requires extra allocations and > >>>>>may exhibit poor worse case behaviour. An alternative is to use an > >>>>>embedded rbtree, i.e. no extra allocations and deterministic behaviour, > >>>>>but at the slight cost of O(lgN) lookups (instead of O(1) for > >>>>>rhashtable). The major of such tree will be very shallow and so not much > >>>>>slower, and still scales much, much better than the current unsorted > >>>>>list. > >>>>> > >>>>>References: https://bugs.freedesktop.org/show_bug.cgi?id=87726 > >>>>>Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >>>> > >>>>I suggest leaving this out of the mini-series which fixes the > >>>>recently introduced bugs. > >>> > >>>Just review it now, it's a two year old bug that was impacting the test > >>>cases I was running... :-p Then we won't have to review it next week ;) > >> > >>I'll review it, just split it out please so the CI fire can be > >>extinguished first. > > > >Sure, I'm pushing the fixes as soon as they've been baked. There's a > >regression fix that builds upon this patch for multiple timelines (the > >linear walk in reservation_object is particularly nasty for scaling). > > Which one is that, the reservation_object_get_fences_rcu in > i915_gem_object_wait_reservation? Mostly reservation_object_add_shared_fence() The reservation object is not autopruning, those fences stay there until replaced. So as an object is used by more and more contexts (such as many GL clients sharing the same resource) that shared array keeps on growing, and we have to walk over it on every execbuf, both to update the fence and indeed often to compute the await. We can use a radixtree/idr or a rhashtable here since they share the RCU share lookup required for reservation_object (and couple in the seqlock for update tracking), the stumbling point is the that reservation_object_reserve_shared_fence() guarrantees the next add_shared_fence() will not fail (ENOMEM). Both idr/radixtree have preload, but both use per-cpu caches and so require preemption disabling from the point of reservation to use; i.e. not suitable. Changing them to preload onto a specific tree seems reasonably easy except for convincing core of their merit. (There changing idr looks like it would be easier.) radixtree is easier to use than idr, but iterating the radixtree for (get_fences_rcu) is not as cheap as I expect. :| -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx