Re: [PATCH] drm/i915/guc: Fix the fix for reset lock confusion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi John,

On Fri, Mar 29, 2024 at 04:53:05PM -0700, John.C.Harrison@xxxxxxxxx wrote:
> From: John Harrison <John.C.Harrison@xxxxxxxxx>
> 
> The previous fix for the circlular lock splat about the busyness
> worker wasn't quite complete. Even though the reset-in-progress flag
> is cleared at the start of intel_uc_reset_finish, the entire function
> is still inside the reset mutex lock. Not sure why the patch appeared
> to fix the issue both locally and in CI. However, it is now back
> again.
> 
> There is a further complication the wedge code path within
> intel_gt_reset() jumps around so much it results in nested
> reset_prepare/_finish calls. That is, the call sequence is:
>   intel_gt_reset
>   | reset_prepare
>   | __intel_gt_set_wedged
>   | | reset_prepare
>   | | reset_finish
>   | reset_finish
> 
> The nested finish means that even if the clear of the in-progress flag
> was moved to the end of _finish, it would still be clear for the
> entire second call. Surprisingly, this does not seem to be causing any
> other problems at present.
> 
> As an aside, a wedge on fini does not call the finish functions at
> all. The reset_in_progress flag is left set (twice).
> 
> So instead of trying to cancel the worker anywhere at all in the reset
> path, just add a cancel to intel_guc_submission_fini instead. Note
> that it is not a problem if the worker is still active during a reset.
> Either it will run before the reset path starts locking things and
> will simply block the reset code for a tiny amount of time. Or it will
> run after the locks have been acquired and will early exit due to the
> try-lock.
> 
> Also, do not use the reset-in-progress flag to decide whether a
> synchronous cancel is safe (from a lockdep perspective) or not.
> Instead, use the actual reset mutex state (both the genuine one and
> the custom rolled BACKOFF one).
> 
> Fixes: 0e00a8814eec ("drm/i915/guc: Avoid circular locking issue on busyness flush")
> Signed-off-by: John Harrison <John.C.Harrison@xxxxxxxxx>
> Cc: Zhanjun Dong <zhanjun.dong@xxxxxxxxx>
> Cc: John Harrison <John.C.Harrison@xxxxxxxxx>
> Cc: Andi Shyti <andi.shyti@xxxxxxxxxxxxxxx>
> Cc: Daniel Vetter <daniel@xxxxxxxx>
> Cc: Daniel Vetter <daniel.vetter@xxxxxxxx>
> Cc: Rodrigo Vivi <rodrigo.vivi@xxxxxxxxx>
> Cc: Nirmoy Das <nirmoy.das@xxxxxxxxx>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@xxxxxxxxx>
> Cc: Andrzej Hajda <andrzej.hajda@xxxxxxxxx>
> Cc: Matt Roper <matthew.d.roper@xxxxxxxxx>
> Cc: Jonathan Cavitt <jonathan.cavitt@xxxxxxxxx>
> Cc: Prathap Kumar Valsan <prathap.kumar.valsan@xxxxxxxxx>
> Cc: Alan Previn <alan.previn.teres.alexis@xxxxxxxxx>
> Cc: Madhumitha Tolakanahalli Pradeep <madhumitha.tolakanahalli.pradeep@xxxxxxxxx>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx>
> Cc: Ashutosh Dixit <ashutosh.dixit@xxxxxxxxx>
> Cc: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@xxxxxxxxx>

Looks good:

Reviewed-by: Andi Shyti <andi.shyti@xxxxxxxxxxxxxxx>

Thanks,
Andi



[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux