Re: [PATCH 07/27] drm/i915: Squash repeated awaits on the same fence

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Thu, 27 Apr 2017 10:47:16 +0100

On Thu, Apr 27, 2017 at 10:20:36AM +0100, Tvrtko Ursulin wrote:
> 
> On 26/04/2017 23:22, Chris Wilson wrote:
> >On Wed, Apr 26, 2017 at 07:56:14PM +0100, Chris Wilson wrote:
> >>On Wed, Apr 26, 2017 at 01:13:41PM +0100, Tvrtko Ursulin wrote:
> >>>I was thinking of exactly the same thing as this patch does, u64
> >>>context id as key, u32 seqnos (wrapped in a container with
> >>>hlist_node).
> >>
> >>#define NSYNC 32
> >>struct intel_timeline_sync { /* kmalloc-256 slab */
> >>	struct hlist_node node;
> >>        u64 prefix;
> >>	u32 bitmap;
> >>	u32 seqno[NSYNC];
> >>};
> >>DECLARE_HASHTABLE(sync, 7);
> >>
> >>If I squint, the numbers favour the idr. ;)
> >
> >Hmm, it didn't take much to start running into misery with a static ht.
> >I know my testing is completely artificial but I am not going to be
> >happy with a static size, it will always be too big or too small and
> >never just Goldilocks.
> 
> Oh what a pity, implementation is so much smaller. What kind of
> misery was it? I presume not longer below the noise floor? With more
> than three buckets?

Yup, after realising the flaw in my userspace test, I was able to
hit intel_timeline_sync_is_later() more often. The difference between
idr/ht in that test is still less than the difference in not squashing,
but it becomes easier to realise a difference (the moment when it was
spending over 90% in that function walking the hash chain was the last
straw).

> If no other choice I'll tackle the review. Hopefully won't get lost
> in all the shifts, leafs, branches and prefixes. :)

You may well win the ht argument when it comes to an RCU compatible
variant for reservation_object; the relative simplicity in walking the
rcu chains is much more reassuring than arguing rcu correctness of
parent pointers and manual stacks for iterators.

Still a fixed sized ht is going to have long chains for igt, and
reservation_objects are very common so we can't go mad in giving each a
large number of buckets. The biggest complexity for reservation_object
is that it offers guaranteed insertion (along with a u64 index that
rules out lib/radixtree, rhashtable).

And I hope one day refcounting becomes reasonably cheap again, since
sadly it's unavoidable in reservation_object (afaict).

> Regards,
> 
> Tvrtko
> 
> P.S. GEM_STATS you mention in the other reply - what are you
> referring to with that? The idea to expose queue depths and possibly
> more via some interface? If so prototyping that is almost next on my
> TODO list.

I was thinking of intrusive debugging stats that we may want to keep
around and conditionally compile in.

Most statistics should not be for public consumption :)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx