On Wed, Apr 26, 2017 at 01:13:41PM +0100, Tvrtko Ursulin wrote: > > On 26/04/2017 12:18, Chris Wilson wrote: > >On Wed, Apr 26, 2017 at 11:54:08AM +0100, Tvrtko Ursulin wrote: > >>>>>+/* struct intel_timeline_sync is a layer of a radixtree that maps a u64 fence > >>>>>+ * context id to the last u32 fence seqno waited upon from that context. > >>>>>+ * Unlike lib/radixtree it uses a parent pointer that allows traversal back to > >>>>>+ * the root. This allows us to access the whole tree via a single pointer > >>>>>+ * to the most recently used layer. We expect fence contexts to be dense > >>>>>+ * and most reuse to be on the same i915_gem_context but on neighbouring > >>>>>+ * engines (i.e. on adjacent contexts) and reuse the same leaf, a very > >>>>>+ * effective lookup cache. If the new lookup is not on the same leaf, we > >>>>>+ * expect it to be on the neighbouring branch. > >>>>>+ * > >>>>>+ * A leaf holds an array of u32 seqno, and has height 0. The bitmap field > >>>>>+ * allows us to store whether a particular seqno is valid (i.e. allows us > >>>>>+ * to distinguish unset from 0). > >>>>>+ * > >>>>>+ * A branch holds an array of layer pointers, and has height > 0, and always > >>>>>+ * has at least 2 layers (either branches or leaves) below it. > >>>>>+ * > >>>>>+ */ > >>>> > >>>>@_@ :) > >>>> > >>>>Ok, so a map of u64 to u32. We can't use IDR or radixtree directly > >>>>because of u64 keys. :( How about a hash table? It would be much > >>>>simpler to review. :) Seriously, if it would perform close enough it > >>>>would be a much much simpler implementation. > >>> > >>>You want a resizable hashtable. rht is appallingly slow, so you want a > >>>custom resizeable ht. They are not as simple as this codewise ;) > >>>(Plus a compressed radixtree is part of my plan for scalability > >>>improvements for struct reservation_object.) > >> > >>Why resizable? I was thinking a normal one. If at any given time we > >>have an active set of contexts, or at least lookups are as you say > >>below, to neighbouring contexts, that would mean we are talking > >>about lookups to different hash buckets. And for the typical > >>working set we would expect many collisions so longer lists in each > >>bucket? So maybe NUM_ENGINES * some typical load constant number > >>buckets would not be that bad? > > > >Consider a long running display server that will accumulate 10,000s of > >thousands of clients in its lifetime, each with their own contents that > >get shared by passing around fences/framebuffers. (Or on a shorter scale > >any of the context stress tests in igt.) Due to the non-recycling of the > >context ids, we can grow to very large tables - but we have no knowledge > >of what contexts are no longer required. > > > >To compensate we need to occasionally prune the sync points. For a ht we > >could just scrap it, For an idr, we could store last use and delete > >stale leaves. > > Hm, pruning yes.. but you don't have pruning in this patch either. > So that's something which needs to be addressed either way. I know, review is great for hindsight. The iterator/remove for the compressed idr is going to be on the uglier side of the insertion. Ugly enough to be a seperate patch. > >But first we have a question of how many buckets do we give the static > >ht? Most processes will be sharing between 2 contexts (render context, > >presentation context) except for a display server who may have 10-100 of > >clients - and possibly where eliminating repeated syncs is going to be > >most valuable. That suggests 256 buckets for every timeline (assuming > >just a pair of engines across shared contexts). Overkill for the > >majority, and going to be miserable in stress tests. > > > >Finally what do you store in the ht? Fences are the obvious candidate, > >but need reaping. Or you just create a ht for the request and only > >squash repeated fences inside a single request - that doesn't benefit > >from timeline tracking and squashing between requests (but does avoid > >keeping fences around forever). Hence why I went with tracking seqno. To > >avoid allocations for the ht nodes, we could create an open-addressed ht > >with the {context, seqno} embedded in it. It would be efficient, but > >needs online resizing and is a fair chunk of new code. > > I was thinking of exactly the same thing as this patch does, u64 > context id as key, u32 seqnos (wrapped in a container with > hlist_node). Hmm, a hashed radixtree. I did read something about a hybrid approach. > Ok, so the key is pruning to keep the display server scenario in check. > > Free the key from i915_fence_release? Too early, it's the timeline (and syncs along it) that's interesting. For our contexts, we can hook into context-close, but we still have some foreign dma-fence-contexts to worry about. I was thinking of walking all timelines from the idle_worker. And possibly forcibly prune across suspend. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx