On 24/04/2018 09:16, Chris Wilson wrote:
If we have more than a few, possibly several thousand request in the
queue, don't show the central portion, just the first few and the last
being executed and/or queued. The first few should be enough to help
identify a problem in execution, and most often comparing the first/last
in the queue is enough to identify problems in the scheduling.
We may need some fine tuning to set MAX_REQUESTS_TO_SHOW for common
debug scenarios, but for the moment if we can avoiding spending more
than a few seconds dumping the GPU state that will avoid a nasty
livelock (where hangcheck spends so long dumping the state, it fires
again and starts to dump the state again in parallel, ad infinitum).
v2: Remember to print last not the stale rq iter after the loop.
Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
---
drivers/gpu/drm/i915/intel_engine_cs.c | 43 +++++++++++++++++++++++---
1 file changed, 38 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 66cddd059666..2398ea71e747 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1307,11 +1307,13 @@ void intel_engine_dump(struct intel_engine_cs *engine,
struct drm_printer *m,
const char *header, ...)
{
+ const int MAX_REQUESTS_TO_SHOW = 8;
struct intel_breadcrumbs * const b = &engine->breadcrumbs;
const struct intel_engine_execlists * const execlists = &engine->execlists;
struct i915_gpu_error * const error = &engine->i915->gpu_error;
- struct i915_request *rq;
+ struct i915_request *rq, *last;
struct rb_node *rb;
+ int count;
if (header) {
va_list ap;
@@ -1378,16 +1380,47 @@ void intel_engine_dump(struct intel_engine_cs *engine,
}
spin_lock_irq(&engine->timeline->lock);
- list_for_each_entry(rq, &engine->timeline->requests, link)
- print_request(m, rq, " E ");
+
+ last = NULL;
+ count = 0;
+ list_for_each_entry(rq, &engine->timeline->requests, link) {
+ if (count++ < MAX_REQUESTS_TO_SHOW - 1)
+ print_request(m, rq, " E ");
+ else
+ last = rq;
else {
last = list_last_entry(...) ?
break;
}
+ }
+ if (last) {
+ if (count > MAX_REQUESTS_TO_SHOW) {
+ drm_printf(m,
+ " ...skipping %d executing requests...\n",
+ count - MAX_REQUESTS_TO_SHOW);
+ }
+ print_request(m, last, " E ");
+ }
Or even stuff this printf in the first loop above, under the else
branch. Maybe shorter would be like this, module off by ones if I made some:
list_for_each_entry(rq, &engine->timeline->requests, link) {
struct i915_request *pr = rq;
if (++count > MAX_REQUESTS_TO_SHOW) {
pr = list_last_entry(...);
drm_printf(m,
" ...skipping %d executing requests...\n",
count - MAX_REQUESTS_TO_SHOW);
}
print_request(m, pr, " E ");
if (count > MAX_REQUESTS_TO_SHOW)
break;
}
+
+ last = NULL;
+ count = 0;
drm_printf(m, " Queue priority: %d\n", execlists->queue_priority);
for (rb = execlists->first; rb; rb = rb_next(rb)) {
struct i915_priolist *p =
rb_entry(rb, typeof(*p), node);
- list_for_each_entry(rq, &p->requests, sched.link)
- print_request(m, rq, " Q ");
+ list_for_each_entry(rq, &p->requests, sched.link) {
+ if (count++ < MAX_REQUESTS_TO_SHOW - 1)
+ print_request(m, rq, " Q ");
+ else
+ last = rq;
+ }
}
+ if (last) {
+ if (count > MAX_REQUESTS_TO_SHOW) {
+ drm_printf(m,
+ " ...skipping %d queued requests...\n",
+ count - MAX_REQUESTS_TO_SHOW);
+ }
+ print_request(m, last, " Q ");
+ }
Then I am thinking how to avoid the duplication and extract the smart
printer. Macro would be easy at least, if a bit ugly.