Re: [PATCH v7 5/8] drm/i915/pxp: Add ARB session creation and cleanup

"Ceraolo Spurio, Daniele" <daniele.ceraolospurio@xxxxxxxxx> · Mon, 17 Apr 2023 12:15:42 -0700






On 4/17/2023 11:21 AM, Teres Alexis, Alan Previn wrote:
On Mon, 2023-04-10 at 10:14 -0700, Ceraolo Spurio, Daniele wrote:


alan:snip

@@ -354,8 +368,14 @@ int intel_pxp_start(struct intel_pxp *pxp)
   	if (!intel_pxp_is_enabled(pxp))
   		return -ENODEV;
   
-	if (wait_for(pxp_component_bound(pxp), 250))
-		return -ENXIO;
+	if (HAS_ENGINE(pxp->ctrl_gt, GSC0)) {
+		/* Use a larger 1 second timeout considering all the dependencies. */
+		if (wait_for(intel_pxp_gsccs_is_ready_for_sessions(pxp), 1000))
+			return -ENXIO;
This needs a comment to explain that we expect userspace to retry later
if we return -ENXIO and therefore the timeout is smaller that what would
be required to guarantee that the init can complete. It also needs an
ack from the userspace drivers for this behavior.

alan: I agree with a requirement to comment this down. However, i believe its better
to put this int the UAPI header file comment-documentation since it applies to both
current MTL as well as previous ADL cases (this is not new behavior being introduced
in MTL but it is fixing of the spec of existing behavior).
That said, your feedback is already being addressed by patch #6 but i will ammend
patch #6 to add the detail you highlighted WRT ~"timeout is not larger than the completion-guarantee...".

Can you move that comment update for the context getparam from the next 
patch to this one? that way it's all nicely self-contained in this patch.

alan:snip
+fw_err_to_string(u32 type)
+{
+	switch (type) {
+	case PXP_STATUS_ERROR_API_VERSION:
+		return "ERR_API_VERSION";
+	case PXP_STATUS_NOT_READY:
+		return "ERR_NOT_READY";
+	case PXP_STATUS_PLATFCONFIG_KF1_NOVERIF:
+	case PXP_STATUS_PLATFCONFIG_KF1_BAD:
+		return "ERR_PLATFORM_CONFIG";
+	default:
+		break;
+	}
+	return NULL;
+}
+
There is a lot of replication for this error handling, I'm wondering if
it's worth adding a common function to handle this. Something like:

intel_pxp_header_error(u32 header, const char *op, u32 id)
{
	if (is_fw_err_platform_config(header.status)) {
		drm_info_once(&i915->drm,
			      "PXP %s-%d failed due to BIOS/SOC:0x%08x:%s\n",
			      op, id, header.status,
			      fw_err_to_string(header.status));
	} else {
		drm_dbg(&i915->drm, "PXP %s-%d failed 0x%08x:%st:\n",
			op, id, header.status,
			fw_err_to_string(header.status));
		drm_dbg(&i915->drm, "     cmd-detail: ID=[0x%08x],API-Ver-[0x%08x]\n",
			header.command_id, header.api_version);
	}
}


Not a blocker.
alan: Thanks - i will have to address as a stand alone patch since i have to
think about moving this to a comment helper layer (common to both ADL-mei-comp and MTL-gsccs)
while potentially have different set of error codes that can mean different reporting levels
(i.e. notice once coz its a platform limit vs always err out if its runtime related).
Once this series gets merged it will be easier to work on that patch (which would require both
backends to be present in the baseline).
alan:snip
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.h
@@ -10,14 +10,18 @@
   
   struct intel_pxp;
   
-#define GSC_REPLY_LATENCY_MS 200
+#define GSC_REPLY_LATENCY_MS 210
why move from 200 to 210? not a problem, I just want to understand why
this is required.

Daniele
alan: so 200 is based on the fw spec - and from real testing i observed the occasional highs touching ~185ms.
However, the spec gives this number as the expected max time the firmware is going to take to process the request
and post a reply. Therefore it doesn't include the additional overhead for the request creation, emision,
submission via guc and the completion pipeline completion indication. All of these contribute additional layers
of possible delay. Since arb-session creation and invalidation are not common events,
I believe a slightly wider overhead of 10 milisec will not hurt.

Agreed. Can you add a comment? something like "Max FW response time is 
200ms, to which we add 10ms to account for overhead".
With those 2 nits addressed:

Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx>

Daniele