On 18/10/18 02:13, Chris Wilson wrote:Quoting Michal Wajdeczko (2018-10-18 00:22:43)On Thu, 18 Oct 2018 01:09:19 +0200, Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx> wrote:On 17/10/18 13:29, Chris Wilson wrote:Propagate the timeout on transferring the fw back to the caller where itmay act upon it, usually by restarting the xfer before failing.Did you see any case where we failed the xfer and didn't get a timeout out of guc_wait_ucode? that'd be quite weirdYes the logs show the xfer error but not the wait error. So we ended up returning 0.The guc status register is correctly cleared by HW on guc reset (just checked that), so if the wait_ucode succeeded in matching a "ready" value in there it means that the guc FW had loaded correctly. Maybe it managed to complete the xfer just after the timeout elapsed or the DMA got confused and reported the wrong status. Still weird, but your changes should help make the error more visible.
maybe first timeout in DMA transfer was mitigated by additional wait in guc_wait_ucode() - so maybe this error propagation is not that important ?
Anyway, definitely better and cleaner than before: Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx> small nitpick belowTestcase: igt/drv_selftest/live_hangcheck Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> Cc: Michal Wajdeczko <michal.wajdeczko@xxxxxxxxx> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx> --- drivers/gpu/drm/i915/intel_guc_fw.c | 23 ++++++----------------- 1 file changed, 6 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_guc_fw.c b/drivers/gpu/drm/i915/intel_guc_fw.c index a9e6fcce467c..b68a05892dab 100644 --- a/drivers/gpu/drm/i915/intel_guc_fw.c +++ b/drivers/gpu/drm/i915/intel_guc_fw.c @@ -125,22 +125,17 @@ static void guc_prepare_xfer(struct intel_guc *guc) }/* Copy RSA signature from the fw image to HW for verification */-static int guc_xfer_rsa(struct intel_guc *guc, struct i915_vma *vma)+static void guc_xfer_rsa(struct intel_guc *guc, struct i915_vma *vma){ struct drm_i915_private *dev_priv = guc_to_i915(guc); - struct intel_uc_fw *guc_fw = &guc->fw; - struct sg_table *sg = vma->pages; u32 rsa[UOS_RSA_SCRATCH_COUNT]; int i; - if (sg_pcopy_to_buffer(sg->sgl, sg->nents, rsa, sizeof(rsa), - guc_fw->rsa_offset) != sizeof(rsa)) - return -EINVAL; + sg_pcopy_to_buffer(vma->pages->sgl, vma->pages->nents, + rsa, sizeof(rsa), guc->fw.rsa_offset); for (i = 0; i < UOS_RSA_SCRATCH_COUNT; i++) I915_WRITE(UOS_RSA_SCRATCH(i), rsa[i]); - - return 0; } /* @@ -251,17 +246,11 @@ static int guc_fw_xfer(struct intel_uc_fw *guc_fw, struct i915_vma *vma)* by the DMA engine in one operation, whereas the RSA signature is* loaded via MMIO. */ - ret = guc_xfer_rsa(guc, vma); - if (ret) - DRM_WARN("GuC firmware signature xfer error %d\n", ret); + guc_xfer_rsa(guc, vma); ret = guc_xfer_ucode(guc, vma); - if (ret) - DRM_WARN("GuC firmware code xfer error %d\n", ret); - - ret = guc_wait_ucode(guc); - if (ret) - DRM_ERROR("GuC firmware xfer error %d\n", ret);With the logs gone we don't have any more error message to understand where exactly we hit issue (xfer vs wait_ucode). We do have debug logs printing the status that would give the info, but might be worth retaining at least 1 error-level log.+1 for leaving error-level logs (here or at helper functions) also, please let CI re-run this patch with GuC enabled ;)I disagree since the errors are noise as the error is propagated and handled by the caller -- so not strictly errors, but maybe DRM_DEBUG_DRIVER(). -ChrisTrue, we don't want an error message if the caller decides that a failure can be expected and retries. we already have DRM_DEBUG_DRIVER printing status inside the GuC wait and GuC xfer functions, those should be enough.Daniele
_______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx