Re: [PATCH] drm/i915/guc: Log engine resets

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Dec 17, 2021 at 12:15:53PM +0000, Tvrtko Ursulin wrote:
> 
> On 14/12/2021 15:07, Tvrtko Ursulin wrote:
> > From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
> > 
> > Log engine resets done by the GuC firmware in the similar way it is done
> > by the execlists backend.
> > 
> > This way we have notion of where the hangs are before the GuC gains
> > support for proper error capture.
> 
> Ping - any interest to log this info?
> 
> All there currently is a non-descriptive "[drm] GPU HANG: ecode
> 12:0:00000000".
>

Yea, this could be helpful. One suggestion below.

> Also, will GuC be reporting the reason for the engine reset at any point?
>

We are working on the error state capture, presumably the registers will
give a clue what caused the hang.

As for the GuC providing a reason, that isn't defined in the interface
but that is decent idea to provide a hint in G2H what the issue was. Let
me run that by the i915 GuC developers / GuC firmware team and see what
they think. 

> Regards,
> 
> Tvrtko
> 
> > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
> > Cc: Matthew Brost <matthew.brost@xxxxxxxxx>
> > Cc: John Harrison <John.C.Harrison@xxxxxxxxx>
> > ---
> >   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 12 +++++++++++-
> >   1 file changed, 11 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 97311119da6f..51512123dc1a 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -11,6 +11,7 @@
> >   #include "gt/intel_context.h"
> >   #include "gt/intel_engine_pm.h"
> >   #include "gt/intel_engine_heartbeat.h"
> > +#include "gt/intel_engine_user.h"
> >   #include "gt/intel_gpu_commands.h"
> >   #include "gt/intel_gt.h"
> >   #include "gt/intel_gt_clock_utils.h"
> > @@ -3934,9 +3935,18 @@ static void capture_error_state(struct intel_guc *guc,
> >   {
> >   	struct intel_gt *gt = guc_to_gt(guc);
> >   	struct drm_i915_private *i915 = gt->i915;
> > -	struct intel_engine_cs *engine = __context_to_physical_engine(ce);
> > +	struct intel_engine_cs *engine = ce->engine;
> >   	intel_wakeref_t wakeref;
> > +	if (intel_engine_is_virtual(engine)) {
> > +		drm_notice(&i915->drm, "%s class, engines 0x%x; GuC engine reset\n",
> > +			   intel_engine_class_repr(engine->class),
> > +			   engine->mask);
> > +		engine = guc_virtual_get_sibling(engine, 0);
> > +	} else {
> > +		drm_notice(&i915->drm, "%s GuC engine reset\n", engine->name);

Probably include the guc_id of the context too then?

Matt

> > +	}
> > +
> >   	intel_engine_set_hung_context(engine, ce);
> >   	with_intel_runtime_pm(&i915->runtime_pm, wakeref)
> >   		i915_capture_error_state(gt, engine->mask);
> > 



[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux