Re: [PATCH 3/3] drm/i915/guc: sleep on enable

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 15/10/18 12:23, Chris Wilson wrote:
Quoting Daniele Ceraolo Spurio (2018-10-15 19:33:26)


On 14/10/18 10:02, Chris Wilson wrote:
Seems like there's a missing ack before the guc is ready for commands.


I'm assuming you're running without HuC since the HuC auth H2G comes
before this one.

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4981/fi-apl-guc/boot0.log
i915.enable_guc=3
<7>[    6.877175] [drm:intel_uc_fw_fetch [i915]] GuC fw fetch i915/bxt_guc_ver9_29.bin
<7>[    6.877268] [drm:intel_uc_fw_fetch [i915]] GuC fw fetch PENDING
<7>[    6.879780] [drm:intel_uc_fw_fetch [i915]] GuC fw size 146432 ptr 000000003fdb20d0
<7>[    6.879869] [drm:intel_uc_fw_fetch [i915]] GuC fw version 9.29 (wanted 9.29)
<7>[    6.880425] [drm:intel_uc_fw_fetch [i915]] GuC fw fetch SUCCESS
<7>[    6.880723] [drm:intel_uc_fw_fetch [i915]] HuC fw fetch i915/bxt_huc_ver01_07_1398.bin
<7>[    6.880807] [drm:intel_uc_fw_fetch [i915]] HuC fw fetch PENDING
<7>[    6.882529] [drm:intel_uc_fw_fetch [i915]] HuC fw size 154432 ptr 000000000aad61c4
<7>[    6.882621] [drm:intel_uc_fw_fetch [i915]] HuC fw version 1.7 (wanted 1.7)
<7>[    6.883098] [drm:intel_uc_fw_fetch [i915]] HuC fw fetch SUCCESS

What we're polling to indicate load completion (GS_UKERNEL_READY) is
definitely what the firmware uses to signal readiness. The other check
we do (GS_MIA_CORE_STATE) should only apply for rc6 scenarios. From what
I can see from the firmware code, all the initialization steps are done
before GS_UKERNEL_READY is written to the status register so there
shouldn't be any missing acks in principle.

Is the GuC returning anything in the scratch 0 register? It should be
printed out by the H2G error message. The value of the status register
(0xc000) could also provide interesting debug info.

When do you want to know? As you are probably aware, our first
indication of failure is from wait_for_guc_preempt_report() and
the wait there on report->report_return_status timing out.

Michel asked what was the value when it timed out, but alas apl-guc was
not available for comment.
-Chris


I think found the root cause of the issue (with the help of one of the GuC devs). The guc suspend/resume protocol requires us to do an extra couple of steps to make sure GuC is done managing its state, waiting on the H2G return is not enough; since we're not correctly doing those GuC is still in the middle of the resume process when the preemption request arrives, thus causing the failure. Patch to fix this incoming.

Note that since we ensure the HW is idle before suspend we could theoretically skip the guc_resume step as there is nothing to restore, but this is untested from the GuC side so not recommended yet. We still need to do guc_suspend since that step ensures that all guc timers are correctly disabled.

I think you mentioned you were also seeing issues even outside of the suspend/resume path, so we probably have a different issue as well :(

Daniele
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux