Re: [PATCH 1/2] drm/i915/gsc: flush the GSC worker before wedging on unload

"Teres Alexis, Alan Previn" <alan.previn.teres.alexis@xxxxxxxxx> · Tue, 7 Mar 2023 22:55:29 +0000

On Thu, 2023-02-23 at 09:21 -0800, Ceraolo Spurio, Daniele wrote:
> If we unload the driver and wedge before the GSC worker is complete,
> the worker will hit an error on its submission to the GSC engine and
> then exit. This is hard to hit for a user, but it is reproducible
> with skipping selftests. The error is handled gracefully by the
> worker, so there are no functional issues, but we still end up with
> an error message in dmesg, which is something we want to avoid as
> this is a supported scenario. We could modify the worker to better
> handle a wedging occurring during its execution, but that gets
> complicated for a couple of reasons:
> - We do want the error on runtime wedging, because there are
>   implications for subsystems outside of GT (i.e., PXP, HDCP), it's
>   only the error on driver unload that we want to silence.
> - The worker is responsible for multiple submissions (GSC FW load,
>   HuC auth, SW proxy), so all of those will have to be adapted to
>   handle the wedged_on_fini scenario.
> Therefore, it's much simpler to just wait for the worker to be done
> before wedging on driver removal, also considering that the worker
> will likely already be idle in the great majority of non-selftest
> scenarios.
> 
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx>
> 
alan:snip
> 
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -784,6 +784,29 @@ void intel_gt_driver_unregister(struct intel_gt *gt)
>  	intel_rps_driver_unregister(&gt->rps);
>  	intel_gsc_fini(&gt->gsc);
>  
> +	/*
> +	 * If we unload the driver and wedge before the GSC worker is complete,
alan:snip
> +	 * scenarios.
> +	 */
> +	intel_gsc_uc_flush_work(&gt->uc.gsc);

alan: nit: from a code readiblity, having intel_gsc_fini before intel_gsc_uc_flush_work almost reads
as if code should have been inverted eventhough we know that the former is for the old mei-comp based
framework and the latter is based on the new gsc-cs based framework. i cant think of how else to resolve
this other unintrusively other than adding a comment. Other than that, also considering we've had offline
testing already verify this and the next patch too, LGTM:
Reviewed-by: Alan Previn <alan.previn.teres.alexis@xxxxxxxxx>