Re: [PATCH] drm/i915: Reduce i915_request_alloc retirement to local context

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Wed, 9 Jan 2019 12:51:53 +0000

On 09/01/2019 12:06, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2019-01-09 11:56:15)

On 07/01/2019 15:29, Chris Wilson wrote:
In the continual quest to reduce the amount of global work required when
submitting requests, replace i915_retire_requests() after allocation
failure to retiring just our ring.

References: 11abf0c5a021 ("drm/i915: Limit the backpressure for i915_request allocation")
Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
---
   drivers/gpu/drm/i915/i915_request.c | 33 +++++++++++++++++++++--------
   1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 1e158eb8cb97..9ba218c6029b 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -477,6 +477,29 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
       return NOTIFY_DONE;
   }
   
+static noinline struct i915_request *
+i915_request_alloc_slow(struct intel_context *ce)
+{
+     struct intel_ring *ring = ce->ring;
+     struct i915_request *rq, *next;
+
+     list_for_each_entry_safe(rq, next, &ring->request_list, ring_link) {
+             /* Ratelimit ourselves to prevent oom from malicious clients */
+             if (&next->ring_link == &ring->request_list) {

list_is_last(next, &ring->request_list) ?

Tried it (needs list_is_last(&next->ring_link,...)), but I slightly
preferred not implying that next was a valid request here, and keeping
the matching form to list termination.
  
+                     cond_synchronize_rcu(rq->rcustate);
+                     break; /* keep the last objects for the next request */
+             }
+
+             if (!i915_request_completed(rq))
+                     break;
+
+             /* Retire our old requests in the hope that we free some */
+             i915_request_retire(rq);
The RCU wait against the last submitted rq is also gone. Now it only
sync against the next to last rq, unless there is more than two live
requests. Is this what you intended?

Nah, I was trying to be too smart, forgetting that we didn't walk the
entire list. The RCU wait is against to the last rq (since next is the
list head at that point, so unchanged wrt to using list_last_entry), but
we break on seeing a busy request, so no ratelimiting if you keep the GPU
busy (not quite as intended!).
  
If the ring timeline has is a list of r-r-r-R-R-R (r=completed,
R=pending) then it looks like it will not sync on anything.

And if the list is r-r-r-r it will sync against a completed rq. Which I
hope is a no-op, but still, the loop logic looks potentially dodgy.

It also has a higher level vulnerability to one hog timeline starving
the rest I think.

Also? Other than forgetting the earlier break preventing the throtting,
what else do you see wrong with throttling along a timeline/ring?

I was on the wrong track when thinking about the removal of global 
retire. I though the hog on one timeline would be able to starve the 
other timeline, but the hog will eventually hit the allocation failure 
and sync against itself. I think it's fine.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx