Re: [PATCH 43/46] drm/i915: Allocate a status page for each timeline

John Harrison <John.C.Harrison@xxxxxxxxx> · Tue, 15 Jan 2019 10:17:21 -0800




On 1/15/2019 01:50, Chris Wilson wrote:
Quoting John Harrison (2019-01-15 00:56:13)
On 1/7/2019 03:55, Chris Wilson wrote:
+static int alloc_hwsp(struct i915_timeline *timeline)
+{
+     struct drm_i915_private *i915 = timeline->i915;
+     struct i915_vma *vma;
+     int offset;
+
+     mutex_lock(&i915->gt.timeline_lock);
+
+restart:
+     offset = find_first_cacheline(i915);
+     if (offset == NBITS && i915->gt.timeline_hwsp) {
+             i915_vma_put(i915->gt.timeline_hwsp);
+             i915->gt.timeline_hwsp = NULL;
+     }
+
+     vma = i915->gt.timeline_hwsp;
+     if (!vma) {
+             struct drm_i915_gem_object *bo;
+
+             /* Drop the lock before allocations */
+             mutex_unlock(&i915->gt.timeline_lock);
+
+             BUILD_BUG_ON(NBITS * CACHELINE_BYTES > PAGE_SIZE);
+             bo = i915_gem_object_create_internal(i915, PAGE_SIZE);
+             if (IS_ERR(bo))
+                     return PTR_ERR(bo);
+
+             i915_gem_object_set_cache_level(bo, I915_CACHE_LLC);
+
+             vma = i915_vma_instance(bo, &i915->ggtt.vm, NULL);
+             if (IS_ERR(vma))
+                     return PTR_ERR(vma);
+
+             mutex_lock(&i915->gt.timeline_lock);
+             if (i915->gt.timeline_hwsp) {
+                     i915_gem_object_put(bo);
+                     goto restart;
+             }
+
+             i915->gt.timeline_hwsp = vma;
+             i915->gt.timeline_free = ~0ull;
+             offset = 0;
+     }
+
+     i915->gt.timeline_free &= ~BIT_ULL(offset);
+
+     timeline->hwsp_ggtt = i915_vma_get(vma);
+     timeline->hwsp_offset = offset * CACHELINE_BYTES;
+
+     mutex_unlock(&i915->gt.timeline_lock);
+
+     return 0;
+}
If I'm reading this correctly then gt.timeline_hwsp/free is the a cached
copy of the most recently allocated but not yet filled bank of seqno
locations. When it gets full, the i915->gt reference gets dropped and a
new page is allocated and used up line by line. Meanwhile, each timeline
has it's own private reference to the page so dropping the i915->gt
reference is safe. And once the last timeline using a given page is
freed, the last reference to that page will be dropped and so the page
itself will also be freed. If a timeline is freed before the currently
cached page is filled, then that timeline's slot will be released and
re-used by the next timeline to be created.

But what about the scenario of a long running system with a small but
growing number of persistent tasks interspersed with many short lived
tasks? In that case, you would end up with many sparsely populated pages
that whose free slots will not get re-used. You could have a linked list
of cached pages. When a page is filled, move it to a 'full' list. When a
timeline is freed, if it's page was on the 'full' list, clear the slot
and move it back to the 'available' list.
Yes. My thinking was a plain slab cache was a quick-and-dirty
improvement over a page-per-timeline. And a freelist would be the next
step.

Or is the idea that a worst case of a single page vma allocation per
timeline is the least of our worries if there is an ever growing number
of timelines/contexts/users in the system?
Nah, it was just an attempt to quickly reduce the number of allocations,
where the worst case of one page+vma per timeline was the starting
point.

We should break this patch down into 1) one-page-per-timeline, 2) slab
cache, 3) free list 4) profit.

At other times we have been wanting to be able to suballocate pages,
something to keep in mind would be extending this to arbitrary cacheline
allocations.
The multi-stage approach sounds good. Keep things simple in this patch 
and then improve the situation later. One thing to be careful of with a 
cacheline allocator would be make sure whatever is being converted 
wasn't using full pages for security reasons. I.e. a page can be private 
to a process, a cacheline will be shared by many. I guess that would 
only really apply to allocations being passed to user land as the kernel 
is considered secure? Or can a user batch buffer write to arbitrary 
locations within the ppHWSP and thereby splat someone else's seqno?

+     if (global_hwsp) {
+             timeline->hwsp_ggtt = i915_vma_get(global_hwsp);
+             timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
+     } else {
+             err = alloc_hwsp(timeline);
+             if (err)
+                     return err;
+     }
+
+     vaddr = i915_gem_object_pin_map(timeline->hwsp_ggtt->obj, I915_MAP_WB);
+     if (IS_ERR(vaddr)) { /* leak the cacheline, but will clean up later */
Can you explain this comment more? Where/when is the later?
On failure here, the cacheline is still marked as allocated in the slab,
but the reference to the page is released. So the backing page will be
released when everyone else finally drops their reference.

Just laziness, since we have the ability to return the cacheline later
on...
Meaning the actual leak is the bit in 'i915->gt.timeline_free' that says 
this cacheline can or can't be used for the next allocation? Presumably 
you could do the bit map munging in the case that 'global_hwsp' is null, 
but the code would certainly be messier for not a lot of gain.


@@ -2616,7 +2628,7 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
               goto error_deref_obj;
       }
   
-     timeline = i915_timeline_create(ctx->i915, ctx->name);
+     timeline = i915_timeline_create(ctx->i915, ctx->name, NULL);
Why does this use the global HWSP rather than a per context one?
.global_hwsp = NULL => it allocates its own HWSP.

Were you thinking of intel_engine_setup_common() which is still using
the global HWSP at this point in time?
Doh. Brain fart. Presumably the engine one will disappear completely? Or 
is it still needed for legacy mode?


-Chris

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx