On Tue, 2023-06-20 at 09:30 -0500, Balasubrawmanian, Vivaik wrote: > On 6/1/2023 12:45 PM, Alan Previn wrote: > > After recent discussions with Mesa folks, it was requested > > that we optimize i915's GET_PARAM for the PXP_STATUS without > > changing the UAPI spec. > > > > This patch adds this additional optimizations: > > - If any PXP initializatoin flow failed, then ensure that > > we catch it so that we can change the returned PXP_STATUS > > from "2" (i.e. 'PXP is supported but not yet ready') > > to "-ENODEV". This typically should not happen and if it > > does, we have a platform configuration. > > - If a PXP arbitration session creation event failed > > due to incorrect firmware version or blocking SOC fusing > > or blocking BIOS configuration (platform reasons that won't > > change if we retry), then reflect that blockage by also > > returning -ENODEV in the GET_PARAM-PXP_STATUS. > > - GET_PARAM:PXP_STATUS should not wait at all if PXP is > > supported but non-i915 dependencies (component-driver / > > firmware) we are still pending to complete the init flows. > > In this case, just return "2" immediately (i.e. 'PXP is > > supported but not yet ready'). > > > > Signed-off-by: Alan Previn <alan.previn.teres.alexis@xxxxxxxxx> > > --- > > drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c | 11 +++++++++- > > drivers/gpu/drm/i915/i915_getparam.c | 2 +- > > drivers/gpu/drm/i915/pxp/intel_pxp.c | 25 ++++++++++++++++++---- > > drivers/gpu/drm/i915/pxp/intel_pxp.h | 2 +- > > drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.c | 7 +++--- > > drivers/gpu/drm/i915/pxp/intel_pxp_tee.c | 7 +++--- > > drivers/gpu/drm/i915/pxp/intel_pxp_types.h | 9 ++++++++ > > 7 files changed, 50 insertions(+), 13 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c > > index fb0984f875f9..4dd744c96a37 100644 > > --- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c > > @@ -42,8 +42,17 @@ static void gsc_work(struct work_struct *work) > > } > > > > ret = intel_gsc_proxy_request_handler(gsc); > > - if (ret) > > + if (ret) { > > + if (actions & GSC_ACTION_FW_LOAD) { > > + /* > > + * a proxy request failure that came together with the > > + * firmware load action means the last part of init has > > + * failed so GSC fw won't be usable after this > > + */ > > + intel_uc_fw_change_status(&gsc->fw, INTEL_UC_FIRMWARE_LOAD_FAIL); > > + } > > goto out_put; > > + } > > > > /* mark the GSC FW init as done the first time we run this */ > > if (actions & GSC_ACTION_FW_LOAD) { > > On the huc authentication comment block above this snippet, the last > statement: "Note that we can only do the GSC auth if the GuC auth was" > is confusing as the code below is only dealing with HuC Authentication. alan: i believe what he meant was "can only do the GSC-based auth if the GuC-based auth"... but I can't change that code as part of this patch - I believe the rules for kernel patch is to ensure each single patch is modular (not mixing unrelated changes) and focuses just on what its described to do. IIRC, we would need to create a separate patch review for that change. > > This function seems to have a section to deal with FW load action and > another to deal with SW Proxy requests, but we seem to be mixing both > actions in the SW proxy section. instead, can we move this call to > intel_gsc_proxy_request_handler to the FW load section itself instead of > handling it as an additional check in the SW_proxy section? In the same > vein, we should also move the intel_uc_fw_change_status() call into the > above FW Load action section. i think that way the code reads better. alan: GSC_ACTION_FW_LOAD is used for loading the GSC firmware which is a one-time thing per i915 load. However, GSC_ACTION_SW_PROXY events can happen any time the GSC fw needs to communicate with CSE firmware (or vice versa) due to platform events that may have not been triggered by i915 long after init. However, the rule is after GSC FW is loaded, i915 is required to do a 1-time proxy-init step to prime both GSC and CSE fws that proxy comms is avail. without this step, we can't use the gsc-fw for other ops. So to recap the rules: 1. we launch the worker to do the one-time the GSC firmware load. 2. after the GSC firmware load is successful, we have to do a one-time SW-proxy init. -> this is why we add the GSC_ACTION_SW_PROXY flag successful load completion. 3. If we are doing proxy-handling for the very first time, we ensure -> FW status is only set to RUNNING if proxy int was good (since GSC FW cant be accessed to do anything (such as hdcp, pxp, etc) without proxy init completion. -> print a message to signal is proxy init failed. This is the only reason why we have the additional "if (actions & GSC_ACTION_FW_LOAD)" check inside the SW Proxy block - we are not mixing fw loading steps all, but just to distinguish between the first-ever-sw-proxy vs the regular runtime sw-proxy where the latter can fail gracefully without blocking future GSC-fw operation or future sw-proxy handling. That said, to ensure we can properly honor all 3 steps above, we can either call intel_gsc_proxy_request_handler twice (once right after fw-load and every other time runtime proxy events occur) ... or ... we can set GSC_ACTION_SW_PROXY twice... basically its the same thing - so wont make much difference. More importantly, this patch is not changing how and where we call intel_gsc_proxy_request_handler but only optimize how we handle the GET_PARAM:PXP_STATUS. In this specific code block we are reviewing, the only change being done is to ensure that we treat the GSC FW status as failed if the first-time-proxy-init step fails. So once again, if we want to change how we call intel_gsc_proxy_request_handler, that would have to be another patch, but in light of above recap of the rules this worker is attempting to honor, i dont agree that we need to change when we call intel_gsc_proxy_request_handler. ...alan > > diff --git a/drivers/gpu/drm/i915/i915_getparam.c b/drivers/gpu/drm/i915/i915_getparam.c > > index 6f11d7eaa91a..1b2ee98a158a 100644 > > --- a/drivers/gpu/drm/i915/i915_getparam.c > > +++ b/drivers/gpu/drm/i915/i915_getparam.c > > @@ -105,7 +105,7 @@ int i915_getparam_ioctl(struct drm_device *dev, void *data, > > return value; > > break; > > case I915_PARAM_PXP_STATUS: > > - value = intel_pxp_get_readiness_status(i915->pxp); > > + value = intel_pxp_get_readiness_status(i915->pxp, 1); > > if (value < 0) > > return value; > > break; > > diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp.c b/drivers/gpu/drm/i915/pxp/intel_pxp.c > > index bb2e15329f34..1478bb9b4e26 100644 > > --- a/drivers/gpu/drm/i915/pxp/intel_pxp.c > > +++ b/drivers/gpu/drm/i915/pxp/intel_pxp.c > > @@ -359,21 +359,38 @@ void intel_pxp_end(struct intel_pxp *pxp) > > intel_runtime_pm_put(&i915->runtime_pm, wakeref); > > } > > > > +static bool pxp_required_fw_failed(struct intel_pxp *pxp) > > +{ > > + if (__intel_uc_fw_status(&pxp->ctrl_gt->uc.huc.fw) == INTEL_UC_FIRMWARE_LOAD_FAIL) > > + return true; > > + if (HAS_ENGINE(pxp->ctrl_gt, GSC0) && > > + __intel_uc_fw_status(&pxp->ctrl_gt->uc.gsc.fw) == INTEL_UC_FIRMWARE_LOAD_FAIL) > > + return true; > > + > > + return false; > > +} > > + > > /* > > * this helper is used by both intel_pxp_start and by > > * the GET_PARAM IOCTL that user space calls. Thus, the > > * return values here should match the UAPI spec. > > */ > > -int intel_pxp_get_readiness_status(struct intel_pxp *pxp) > > +int intel_pxp_get_readiness_status(struct intel_pxp *pxp, int timeout) > > { > > if (!intel_pxp_is_enabled(pxp)) > > return -ENODEV; > > > > + if (pxp_required_fw_failed(pxp)) > > + return -ENODEV; > > + > > + if (pxp->platform_cfg_is_bad) > > + return -ENODEV; > > + > > if (HAS_ENGINE(pxp->ctrl_gt, GSC0)) { > > - if (wait_for(intel_pxp_gsccs_is_ready_for_sessions(pxp), 250)) > > + if (wait_for(intel_pxp_gsccs_is_ready_for_sessions(pxp), timeout)) > > return 2; > > } else { > > - if (wait_for(pxp_component_bound(pxp), 250)) > > + if (wait_for(pxp_component_bound(pxp), timeout)) > > return 2; > > } > > return 1; > > @@ -387,7 +404,7 @@ int intel_pxp_start(struct intel_pxp *pxp) > > { > > int ret = 0; > > > > - ret = intel_pxp_get_readiness_status(pxp); > > + ret = intel_pxp_get_readiness_status(pxp, 250); > > if (ret < 0) > > return ret; > > else if (ret > 1) > > In intel_pxp_start(), shouldn't the 250ms be defined in the struct as a > define with a comment that explains why it is 250 vs some other number? alan: the value 250 is a carry forward from previous ADL implementation but this number is not being used for fw interaction but only to check for readiness. That said there is not other location we use this value for this purpose. Thus, i can add a #define, but would only be used in the same function call and no other. I can add that if u insist. Side note: When intel_pxp_start is called a part of GEM_CONTEXT_CREATE call, i915 UAPI already specs that I915_CONTEXT_PARAM_PROTECTED_CONTENT can fail with -ENXIO when dependencies are not ready and user space can retry so the 250 was chosen based on historical ADL reviews but won't mean anything since the user space will have to retry anyway. > Also in the i915_getparam_ioctl, shouldn't the timeout value be 0 > instead of 1 as this is a simple status check? alan: yes, you are right .. I'll fix that -> get_param shouldnt wait for any timeout. > Also, the return value of 2 if the timeout expires seems > counter-intuitive. I think EBUSY will be more appropriate especially > since the IOCTL call seems to be a quick status check. The IOCTL UAPI behavior was spec'd in the past with the UMD folks and was mirror-ing the only other GET_PARAM type that has a runtime status change where negative value means no support and positive values mean support is available. But we use different positive values to represent different stages of support readiness where 1 is fully ready and 2,3,4... for not yet ready for reason b, c, d... (this model scales for future hw/fw/sw readiness states that doesnt exist yet without breaking backwards compatibility of the UAPI spec. Ofc, today we only use '2' for adl/mtl-pxp. that said, we can't change the UAPI spec now els we'd break backware compatibility with existing SW since the UMD has already implemented the change to follow this spec meaning of negative vs 1 vs 2. alan:snip