Thanks for reviewing - responses below. On Thu, 2023-01-19 at 14:35 -0500, Vivi, Rodrigo wrote: > On Thu, Jan 12, 2023 at 05:18:49PM -0800, Alan Previn wrote: > > A driver bug was recently discovered where the security firmware was > > receiving internal HW signals indicating that session key expirations > > had occurred. Architecturally, the firmware was expecting a response > > from the GuC to acknowledge the event with the firmware side. > > However the OS was in a suspended state and GuC had been reset. > > > > Internal specifications actually required the driver to ensure > > that all active sessions be properly cleaned up in such cases where > > the system is suspended and the GuC potentially unable to respond. > > > > This patch adds the global teardown code in i915's suspend_prepare > > code path. > > > > Signed-off-by: Alan Previn <alan.previn.teres.alexis@xxxxxxxxx> > > Reviewed-by: Juston Li <justonli@xxxxxxxxxxxx> > > Alan: [snip] > > > > +static int __pxp_global_teardown_locked(struct intel_pxp *pxp, bool terminate_for_cleanup) > > +{ > > + if (terminate_for_cleanup) { > > + if (!pxp->arb_is_valid) > > + return 0; > > + /* > > + * To ensure synchronous and coherent session teardown completion > > + * in response to suspend or shutdown triggers, don't use a worker. > > + */ > > + intel_pxp_mark_termination_in_progress(pxp); > > + intel_pxp_terminate(pxp, false); > > + } else { > > + if (pxp->arb_is_valid) > > + return 0; > > + /* > > + * If we are not in final termination, and the arb-session is currently > > + * inactive, we are doing a reset and restart due to some runtime event. > > + * Use the worker that was designed for this. > > + */ > > + pxp_queue_termination(pxp); > > + } > > I really don't see why you need 1 function for totally 2 different cases. > Why not 2 functions then? > Alan: I don't see why not ;) My goal with above method was was to concentrate the teardown steps in a single function so if future changes are required, we can keep it in this single function entry point. For now i will assume that was a nack so i shall split it on next rev. > > + > > + if (!wait_for_completion_timeout(&pxp->termination, msecs_to_jiffies(250))) > > + return -ETIMEDOUT; > > + > > + return 0; > > +} > > + > > Alan: [snip] > > diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp.h b/drivers/gpu/drm/i915/pxp/intel_pxp.h > > index 9658d3005222..3ded0890cd27 100644 > > --- a/drivers/gpu/drm/i915/pxp/intel_pxp.h > > +++ b/drivers/gpu/drm/i915/pxp/intel_pxp.h > > @@ -27,6 +27,7 @@ void intel_pxp_mark_termination_in_progress(struct intel_pxp *pxp); > > void intel_pxp_tee_end_arb_fw_session(struct intel_pxp *pxp, u32 arb_session_id); > > > > int intel_pxp_start(struct intel_pxp *pxp); > > +void intel_pxp_end(struct intel_pxp *pxp); > > > > int intel_pxp_key_check(struct intel_pxp *pxp, > > struct drm_i915_gem_object *obj, > > diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_pm.c b/drivers/gpu/drm/i915/pxp/intel_pxp_pm.c > > index 892d39cc61c1..e427464aa131 100644 > > --- a/drivers/gpu/drm/i915/pxp/intel_pxp_pm.c > > +++ b/drivers/gpu/drm/i915/pxp/intel_pxp_pm.c > > @@ -16,7 +16,7 @@ void intel_pxp_suspend_prepare(struct intel_pxp *pxp) > > if (!intel_pxp_is_enabled(pxp)) > > return; > > > > - pxp->arb_is_valid = false; > > + intel_pxp_end(pxp); > > > > intel_pxp_invalidate(pxp); > > } > > diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_session.c b/drivers/gpu/drm/i915/pxp/intel_pxp_session.c > > index 74ed7e16e481..d8278c4002e3 100644 > > --- a/drivers/gpu/drm/i915/pxp/intel_pxp_session.c > > +++ b/drivers/gpu/drm/i915/pxp/intel_pxp_session.c > > @@ -115,11 +115,14 @@ static int pxp_terminate_arb_session_and_global(struct intel_pxp *pxp) > > return ret; > > } > > > > -static void pxp_terminate(struct intel_pxp *pxp) > > +void intel_pxp_terminate(struct intel_pxp *pxp, bool restart_arb) > > { > > int ret; > > > > - pxp->hw_state_invalidated = true; > > + if (restart_arb) > > + pxp->hw_state_invalidated = true; > > + else > > + pxp->hw_state_invalidated = false; > > o.O > > pxp->hw_state_invalidate = restart_arb; Alan: duhhhh... (my bad) > > ? > > or even a better name for the restart_arb to already indicate that is > the hw_state_invalidate ? > Alan: hmmm... you something mean like: hw_state_invalidated = post_invalidation_needs_restart; Alan: actually i wish we couold redo "hw_state_invalidate" which is currently defined as a boolean that only means one thing -> teardown and restart. It would be more scalable if we can replace it with a bitmask of "current + (infered)pending state" with a documented state-machine with a fixed set of state-transition paths. INACTIVE----> STARTING----> ACTIVE ----> TEARDOWN_RESTART--->| ^ ^ | | | | | V | |<--------------)----------<---------------| | | | |-----> TEARDOWN_END---->--| | V |<-----------------<----------------<------------------| However, I didn't do this initially because it would mean a wider set of changes that might take more time to test and review (downstream customers impacts) but for only 5 states but where only 2 of em are impacted by this change. For now i shall go with the simpler name change as you hint above - unless you request this instead. Alan: [snip]