Quoting Tvrtko Ursulin (2018-04-24 10:40:57) > > On 24/04/2018 10:33, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2018-04-24 10:27:23) > >> > >> On 24/04/2018 09:16, Chris Wilson wrote: > >>> If we have more than a few, possibly several thousand request in the > >>> queue, don't show the central portion, just the first few and the last > >>> being executed and/or queued. The first few should be enough to help > >>> identify a problem in execution, and most often comparing the first/last > >>> in the queue is enough to identify problems in the scheduling. > >>> > >>> We may need some fine tuning to set MAX_REQUESTS_TO_SHOW for common > >>> debug scenarios, but for the moment if we can avoiding spending more > >>> than a few seconds dumping the GPU state that will avoid a nasty > >>> livelock (where hangcheck spends so long dumping the state, it fires > >>> again and starts to dump the state again in parallel, ad infinitum). > >>> > >>> v2: Remember to print last not the stale rq iter after the loop. > >>> > >>> Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >>> --- > >>> drivers/gpu/drm/i915/intel_engine_cs.c | 43 +++++++++++++++++++++++--- > >>> 1 file changed, 38 insertions(+), 5 deletions(-) > >>> > >>> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c > >>> index 66cddd059666..2398ea71e747 100644 > >>> --- a/drivers/gpu/drm/i915/intel_engine_cs.c > >>> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c > >>> @@ -1307,11 +1307,13 @@ void intel_engine_dump(struct intel_engine_cs *engine, > >>> struct drm_printer *m, > >>> const char *header, ...) > >>> { > >>> + const int MAX_REQUESTS_TO_SHOW = 8; > >>> struct intel_breadcrumbs * const b = &engine->breadcrumbs; > >>> const struct intel_engine_execlists * const execlists = &engine->execlists; > >>> struct i915_gpu_error * const error = &engine->i915->gpu_error; > >>> - struct i915_request *rq; > >>> + struct i915_request *rq, *last; > >>> struct rb_node *rb; > >>> + int count; > >>> > >>> if (header) { > >>> va_list ap; > >>> @@ -1378,16 +1380,47 @@ void intel_engine_dump(struct intel_engine_cs *engine, > >>> } > >>> > >>> spin_lock_irq(&engine->timeline->lock); > >>> - list_for_each_entry(rq, &engine->timeline->requests, link) > >>> - print_request(m, rq, " E "); > >>> + > >>> + last = NULL; > >>> + count = 0; > >>> + list_for_each_entry(rq, &engine->timeline->requests, link) { > >>> + if (count++ < MAX_REQUESTS_TO_SHOW - 1) > >>> + print_request(m, rq, " E "); > >>> + else > >>> + last = rq; > >> > >> else { > >> last = list_last_entry(...) ? > >> break; > >> } > >> > >>> + } > >>> + if (last) { > >>> + if (count > MAX_REQUESTS_TO_SHOW) { > >>> + drm_printf(m, > >>> + " ...skipping %d executing requests...\n", > >>> + count - MAX_REQUESTS_TO_SHOW); > >>> + } > >>> + print_request(m, last, " E "); > >>> + } > >> > >> Or even stuff this printf in the first loop above, under the else > >> branch. Maybe shorter would be like this, module off by ones if I made some: > >> > >> list_for_each_entry(rq, &engine->timeline->requests, link) { > >> struct i915_request *pr = rq; > >> > >> if (++count > MAX_REQUESTS_TO_SHOW) { > >> pr = list_last_entry(...); > >> drm_printf(m, > >> " ...skipping %d executing requests...\n", > >> count - MAX_REQUESTS_TO_SHOW); > >> } > >> > >> print_request(m, pr, " E "); > >> > >> if (count > MAX_REQUESTS_TO_SHOW) > >> break; > >> } > >> > >>> + > >>> + last = NULL; > >>> + count = 0; > >>> drm_printf(m, " Queue priority: %d\n", execlists->queue_priority); > >>> for (rb = execlists->first; rb; rb = rb_next(rb)) { > >>> struct i915_priolist *p = > >>> rb_entry(rb, typeof(*p), node); > >>> > >>> - list_for_each_entry(rq, &p->requests, sched.link) > >>> - print_request(m, rq, " Q "); > >>> + list_for_each_entry(rq, &p->requests, sched.link) { > >>> + if (count++ < MAX_REQUESTS_TO_SHOW - 1) > >>> + print_request(m, rq, " Q "); > >>> + else > >>> + last = rq; > >>> + } > >>> } > >>> + if (last) { > >>> + if (count > MAX_REQUESTS_TO_SHOW) { > >>> + drm_printf(m, > >>> + " ...skipping %d queued requests...\n", > >>> + count - MAX_REQUESTS_TO_SHOW); > >>> + } > >>> + print_request(m, last, " Q "); > >>> + } > >> > >> Then I am thinking how to avoid the duplication and extract the smart > >> printer. Macro would be easy at least, if a bit ugly. > > > > Yeah, and for the moment, I'd like to keep the duplication obvious and > > keep the two passes very similar. > > > >> Another idea comes to mind, but probably for the future, to merge same > >> prio, context and strictly consecutive seqnos to a single line of output > >> like: > >> > >> [prefix]seqno-seqno [%llx:seqno-seqno] (%u consecutive requests) > >> > >> Give or take, but it will be more involved to do that. > > > > Deciding when rq are equivalent will get messy, and going to drift. > > > >> > >>> + > >>> spin_unlock_irq(&engine->timeline->lock); > >>> > >>> spin_lock_irq(&b->rb_lock); > >>> > >> > >> Looks OK in general, just please see if you like my idea to contain the > >> logic within a single loop and maybe even move it to a macro. > > > > I don't, because it missed one important thing, the count of skipped > > requests. :-p > > Doh.. Well, I don't like it, but have no better/easier ideas at the moment. > > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> I'll take it as an improvement that can be refined again and again and again... Thanks, -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx