On Mon, Dec 20, 2021 at 04:52:19PM -0800, John.C.Harrison@xxxxxxxxx wrote: > From: John Harrison <John.C.Harrison@xxxxxxxxx> > > There is a known (but exceedingly unlikely) race condition where the > asynchronous frequency management code could reduce the GT clock while > a GuC reload is in progress (during a full GT reset). A fix is in > progress but there are complex locking issues to be resolved. In the > meantime bump the timeout to 500ms. Even at slowest clock, this > should be sufficient. And in the working case, a larger timeout makes > no difference. > > Signed-off-by: John Harrison <John.C.Harrison@xxxxxxxxx> Any idea of the ETA for the proper fix? Also if the proper fix makes the locking more complicated I'm probably of the opinion we just live with a longer timer as full GTs shouldn't really ever happen in practice and if they take a longer time, so be it. Anyways for this patch: Reviewed-by: Matthew Brost <matthew.brost@xxxxxxxxx> > --- > drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c | 13 +++++++++++-- > 1 file changed, 11 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c > index 31420ce1ce6b..c03bde5ec61f 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c > @@ -105,12 +105,21 @@ static int guc_wait_ucode(struct intel_uncore *uncore) > /* > * Wait for the GuC to start up. > * NB: Docs recommend not using the interrupt for completion. > - * Measurements indicate this should take no more than 20ms, so a > + * Measurements indicate this should take no more than 20ms > + * (assuming the GT clock is at maximum frequency). So, a > * timeout here indicates that the GuC has failed and is unusable. > * (Higher levels of the driver may decide to reset the GuC and > * attempt the ucode load again if this happens.) > + * > + * FIXME: There is a known (but exceedingly unlikely) race condition > + * where the asynchronous frequency management code could reduce > + * the GT clock while a GuC reload is in progress (during a full > + * GT reset). A fix is in progress but there are complex locking > + * issues to be resolved. In the meantime bump the timeout to > + * 500ms. Even at slowest clock, this should be sufficient. And > + * in the working case, a larger timeout makes no difference. > */ > - ret = wait_for(guc_ready(uncore, &status), 100); > + ret = wait_for(guc_ready(uncore, &status), 500); > if (ret) { > struct drm_device *drm = &uncore->i915->drm; > > -- > 2.25.1 >