Quoting Tvrtko Ursulin (2018-04-24 10:27:23) > > On 24/04/2018 09:16, Chris Wilson wrote: > > If we have more than a few, possibly several thousand request in the > > queue, don't show the central portion, just the first few and the last > > being executed and/or queued. The first few should be enough to help > > identify a problem in execution, and most often comparing the first/last > > in the queue is enough to identify problems in the scheduling. > > > > We may need some fine tuning to set MAX_REQUESTS_TO_SHOW for common > > debug scenarios, but for the moment if we can avoiding spending more > > than a few seconds dumping the GPU state that will avoid a nasty > > livelock (where hangcheck spends so long dumping the state, it fires > > again and starts to dump the state again in parallel, ad infinitum). > > > > v2: Remember to print last not the stale rq iter after the loop. > > > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > --- > > drivers/gpu/drm/i915/intel_engine_cs.c | 43 +++++++++++++++++++++++--- > > 1 file changed, 38 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c > > index 66cddd059666..2398ea71e747 100644 > > --- a/drivers/gpu/drm/i915/intel_engine_cs.c > > +++ b/drivers/gpu/drm/i915/intel_engine_cs.c > > @@ -1307,11 +1307,13 @@ void intel_engine_dump(struct intel_engine_cs *engine, > > struct drm_printer *m, > > const char *header, ...) > > { > > + const int MAX_REQUESTS_TO_SHOW = 8; > > struct intel_breadcrumbs * const b = &engine->breadcrumbs; > > const struct intel_engine_execlists * const execlists = &engine->execlists; > > struct i915_gpu_error * const error = &engine->i915->gpu_error; > > - struct i915_request *rq; > > + struct i915_request *rq, *last; > > struct rb_node *rb; > > + int count; > > > > if (header) { > > va_list ap; > > @@ -1378,16 +1380,47 @@ void intel_engine_dump(struct intel_engine_cs *engine, > > } > > > > spin_lock_irq(&engine->timeline->lock); > > - list_for_each_entry(rq, &engine->timeline->requests, link) > > - print_request(m, rq, " E "); > > + > > + last = NULL; > > + count = 0; > > + list_for_each_entry(rq, &engine->timeline->requests, link) { > > + if (count++ < MAX_REQUESTS_TO_SHOW - 1) > > + print_request(m, rq, " E "); > > + else > > + last = rq; > > else { > last = list_last_entry(...) ? > break; > } > > > + } > > + if (last) { > > + if (count > MAX_REQUESTS_TO_SHOW) { > > + drm_printf(m, > > + " ...skipping %d executing requests...\n", > > + count - MAX_REQUESTS_TO_SHOW); > > + } > > + print_request(m, last, " E "); > > + } > > Or even stuff this printf in the first loop above, under the else > branch. Maybe shorter would be like this, module off by ones if I made some: > > list_for_each_entry(rq, &engine->timeline->requests, link) { > struct i915_request *pr = rq; > > if (++count > MAX_REQUESTS_TO_SHOW) { > pr = list_last_entry(...); > drm_printf(m, > " ...skipping %d executing requests...\n", > count - MAX_REQUESTS_TO_SHOW); > } > > print_request(m, pr, " E "); > > if (count > MAX_REQUESTS_TO_SHOW) > break; > } > > > + > > + last = NULL; > > + count = 0; > > drm_printf(m, " Queue priority: %d\n", execlists->queue_priority); > > for (rb = execlists->first; rb; rb = rb_next(rb)) { > > struct i915_priolist *p = > > rb_entry(rb, typeof(*p), node); > > > > - list_for_each_entry(rq, &p->requests, sched.link) > > - print_request(m, rq, " Q "); > > + list_for_each_entry(rq, &p->requests, sched.link) { > > + if (count++ < MAX_REQUESTS_TO_SHOW - 1) > > + print_request(m, rq, " Q "); > > + else > > + last = rq; > > + } > > } > > + if (last) { > > + if (count > MAX_REQUESTS_TO_SHOW) { > > + drm_printf(m, > > + " ...skipping %d queued requests...\n", > > + count - MAX_REQUESTS_TO_SHOW); > > + } > > + print_request(m, last, " Q "); > > + } > > Then I am thinking how to avoid the duplication and extract the smart > printer. Macro would be easy at least, if a bit ugly. Yeah, and for the moment, I'd like to keep the duplication obvious and keep the two passes very similar. > Another idea comes to mind, but probably for the future, to merge same > prio, context and strictly consecutive seqnos to a single line of output > like: > > [prefix]seqno-seqno [%llx:seqno-seqno] (%u consecutive requests) > > Give or take, but it will be more involved to do that. Deciding when rq are equivalent will get messy, and going to drift. > > > + > > spin_unlock_irq(&engine->timeline->lock); > > > > spin_lock_irq(&b->rb_lock); > > > > Looks OK in general, just please see if you like my idea to contain the > logic within a single loop and maybe even move it to a macro. I don't, because it missed one important thing, the count of skipped requests. :-p -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx