Re: [PATCH v2] drm/i915: Don't dump umpteen thousand requests

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Tue, 24 Apr 2018 10:40:57 +0100

On 24/04/2018 10:33, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2018-04-24 10:27:23)

On 24/04/2018 09:16, Chris Wilson wrote:
If we have more than a few, possibly several thousand request in the
queue, don't show the central portion, just the first few and the last
being executed and/or queued. The first few should be enough to help
identify a problem in execution, and most often comparing the first/last
in the queue is enough to identify problems in the scheduling.

We may need some fine tuning to set MAX_REQUESTS_TO_SHOW for common
debug scenarios, but for the moment if we can avoiding spending more
than a few seconds dumping the GPU state that will avoid a nasty
livelock (where hangcheck spends so long dumping the state, it fires
again and starts to dump the state again in parallel, ad infinitum).

v2: Remember to print last not the stale rq iter after the loop.

Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
---
   drivers/gpu/drm/i915/intel_engine_cs.c | 43 +++++++++++++++++++++++---
   1 file changed, 38 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 66cddd059666..2398ea71e747 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1307,11 +1307,13 @@ void intel_engine_dump(struct intel_engine_cs *engine,
                      struct drm_printer *m,
                      const char *header, ...)
   {
+     const int MAX_REQUESTS_TO_SHOW = 8;
       struct intel_breadcrumbs * const b = &engine->breadcrumbs;
       const struct intel_engine_execlists * const execlists = &engine->execlists;
       struct i915_gpu_error * const error = &engine->i915->gpu_error;
-     struct i915_request *rq;
+     struct i915_request *rq, *last;
       struct rb_node *rb;
+     int count;
   
       if (header) {
               va_list ap;
@@ -1378,16 +1380,47 @@ void intel_engine_dump(struct intel_engine_cs *engine,
       }
   
       spin_lock_irq(&engine->timeline->lock);
-     list_for_each_entry(rq, &engine->timeline->requests, link)
-             print_request(m, rq, "        E ");
+
+     last = NULL;
+     count = 0;
+     list_for_each_entry(rq, &engine->timeline->requests, link) {
+             if (count++ < MAX_REQUESTS_TO_SHOW - 1)
+                     print_request(m, rq, "        E ");
+             else
+                     last = rq;

else {
         last = list_last_entry(...) ?
         break;
}

+     }
+     if (last) {
+             if (count > MAX_REQUESTS_TO_SHOW) {
+                     drm_printf(m,
+                                "        ...skipping %d executing requests...\n",
+                                count - MAX_REQUESTS_TO_SHOW);
+             }
+             print_request(m, last, "        E ");
+     }

Or even stuff this printf in the first loop above, under the else
branch. Maybe shorter would be like this, module off by ones if I made some:

list_for_each_entry(rq, &engine->timeline->requests, link) {
         struct i915_request *pr = rq;

         if (++count > MAX_REQUESTS_TO_SHOW) {
                 pr = list_last_entry(...);
                 drm_printf(m,
                            "        ...skipping %d executing requests...\n",
                            count - MAX_REQUESTS_TO_SHOW);
         }
         
         print_request(m, pr, "        E ");
         
         if (count > MAX_REQUESTS_TO_SHOW)
                 break;
}

+
+     last = NULL;
+     count = 0;
       drm_printf(m, "        Queue priority: %d\n", execlists->queue_priority);
       for (rb = execlists->first; rb; rb = rb_next(rb)) {
               struct i915_priolist *p =
                       rb_entry(rb, typeof(*p), node);
   
-             list_for_each_entry(rq, &p->requests, sched.link)
-                     print_request(m, rq, "        Q ");
+             list_for_each_entry(rq, &p->requests, sched.link) {
+                     if (count++ < MAX_REQUESTS_TO_SHOW - 1)
+                             print_request(m, rq, "        Q ");
+                     else
+                             last = rq;
+             }
       }
+     if (last) {
+             if (count > MAX_REQUESTS_TO_SHOW) {
+                     drm_printf(m,
+                                "        ...skipping %d queued requests...\n",
+                                count - MAX_REQUESTS_TO_SHOW);
+             }
+             print_request(m, last, "        Q ");
+     }

Then I am thinking how to avoid the duplication and extract the smart
printer. Macro would be easy at least, if a bit ugly.

Yeah, and for the moment, I'd like to keep the duplication obvious and
keep the two passes very similar.
  
Another idea comes to mind, but probably for the future, to merge same
prio, context and strictly consecutive seqnos to a single line of output
like:

   [prefix]seqno-seqno [%llx:seqno-seqno] (%u consecutive requests)

Give or take, but it will be more involved to do that.

Deciding when rq are equivalent will get messy, and going to drift.


+
       spin_unlock_irq(&engine->timeline->lock);
   
       spin_lock_irq(&b->rb_lock);


Looks OK in general, just please see if you like my idea to contain the
logic within a single loop and maybe even move it to a macro.

I don't, because it missed one important thing, the count of skipped
requests. :-p

Doh.. Well, I don't like it, but have no better/easier ideas at the moment.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx