Hi, > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > > > index a0e3ef1c65d2..600388c849f7 100644 > > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > > > @@ -1359,7 +1359,16 @@ static void guc_enable_busyness_worker(struct intel_guc *guc) > > > > static void guc_cancel_busyness_worker(struct intel_guc *guc) > > > > { > > > > - cancel_delayed_work_sync(&guc->timestamp.work); > > > > + /* > > > > + * When intel_gt_reset was called, task will hold a lock. > > > > + * To cacel delayed work here, the _sync version will also acquire a lock, which might > > > > + * trigger the possible cirular locking dependency warning. > > > > + * Check the reset_in_progress flag, call async verion if reset is in progress. > > > > + */ > > > This needs to explain in much more detail what is going on and why it is not > > > a problem. E.g.: > > > > > > The busyness worker needs to be cancelled. In general that means > > > using the synchronous cancel version to ensure that an in-progress > > > worker will not keep executing beyond whatever is happening that > > > needs the cancel. E.g. suspend, driver unload, etc. However, in the > > > case of a reset, the synchronous version is not required and can > > > trigger a false deadlock detection warning. > > > > > > The business worker takes the reset mutex to protect against resets > > > interfering with it. However, it does a trylock and bails out if the > > > reset lock is already acquired. Thus there is no actual deadlock or > > > other concern with the worker running concurrently with a reset. So > > > an asynchronous cancel is safe in the case of a reset rather than a > > > driver unload or suspend type operation. On the other hand, if the > > > cancel_sync version is used when a reset is in progress then the > > > mutex deadlock detection sees the mutex being acquired through > > > multiple paths and complains. > > > > > > So just don't bother. That keeps the detection code happy and is > > > safe because of the trylock code described above. > > So why do we even need to cancel anything if it doesn't do anything while > > the reset is in progress? > It still needs to be cancelled. The worker only aborts if it is actively > executing concurrently with the reset. It might not start to execute until > after the reset has completed. And there is presumably a reason why the > cancel is being called, a reason not necessarily related to resets at all. > Leaving the worker to run arbitrarily after the driver is expecting it to be > stopped will lead to much worse things than a fake lockdep splat, e.g. a use > after free pointer deref. I was actually thinking why not leave things as they are and just disable lockdep from CI. This doesn't look like a relevant report to me. Andi