On Fri, Jun 04, 2021 at 10:33:07AM +0200, Daniel Vetter wrote: > On Wed, Jun 02, 2021 at 10:16:23PM -0700, Matthew Brost wrote: > > From: Michal Wajdeczko <michal.wajdeczko@xxxxxxxxx> > > > > In upcoming patch we will allow more CTB requests to be sent in > > parallel to the GuC for processing, so we shouldn't assume any more > > that GuC will always reply without 10ms. > > > > Use bigger value from CONFIG_DRM_I915_GUC_CTB_TIMEOUT instead. > > > > v2: Add CONFIG_DRM_I915_GUC_CTB_TIMEOUT config option > > > > Signed-off-by: Michal Wajdeczko <michal.wajdeczko@xxxxxxxxx> > > Signed-off-by: Matthew Brost <matthew.brost@xxxxxxxxx> > > Reviewed-by: Matthew Brost <matthew.brost@xxxxxxxxx> > > So this is a rant, but for upstream we really need to do better than > internal: > > - The driver must work by default in the optimal configuration. > > - Any config change that we haven't validated _must_ taint the kernel > (this is especially for module options, but also for config settings) > > - Config need a real reason beyond "was useful for bring-up". > > Our internal tree is an absolute disaster right now, with multi-line > kernel configs (different on each platform) and bespoke kernel config or > the driver just fails. We're the expert on our own hw, we should know how > it works, not offload that to users essentially asking them "how shitty do > you think Intel hw is in responding timely". > > Yes I know there's a lot of these there already, they don't make a lot of > sense either. > > Except if there's a real reason for this (aside from us just offloading > testing to our users instead of doing it ourselves properly) I think we > should hardcode this, with a comment explaining why. Maybe with a switch > between the PF/VF case once that's landed. > > > --- > > drivers/gpu/drm/i915/Kconfig.profile | 10 ++++++++++ > > drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 5 ++++- > > 2 files changed, 14 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile > > index 39328567c200..0d5475b5f28a 100644 > > --- a/drivers/gpu/drm/i915/Kconfig.profile > > +++ b/drivers/gpu/drm/i915/Kconfig.profile > > @@ -38,6 +38,16 @@ config DRM_I915_USERFAULT_AUTOSUSPEND > > May be 0 to disable the extra delay and solely use the device level > > runtime pm autosuspend delay tunable. > > > > +config DRM_I915_GUC_CTB_TIMEOUT > > + int "How long to wait for the GuC to make forward progress on CTBs (ms)" > > + default 1500 # milliseconds > > + range 10 60000 > > Also range is definitely off, drm/scheduler will probably nuke you > beforehand :-) > > That's kinda another issue I have with all these kconfig knobs: Maybe we > need a knob for "relax with reset attempts, my workloads overload my gpus > routinely", which then scales _all_ timeouts proportionally. But letting > the user set them all, with silly combiniations like resetting the > workload before heartbeat or stuff like that doesn't make much sense. > Yes, the code as is the user could do some wacky stuff that doesn't make sense at all. > Anyway, tiny patch so hopefully I can leave this one out for now until > we've closed this. No issue leaving this out as blocking CTBs are never really used anyways until SRIOV aside from setup / debugging. That being said, we might still want a higher hardcoded value in the meantime, perhaps around a second. I can follow up on that if needed. Matt > -Daniel > > > + help > > + Configures the default timeout waiting for GuC the to make forward > > + progress on CTBs. e.g. Waiting for a response to a requeset. > > + > > + A range of 10 ms to 60000 ms is allowed. > > + > > config DRM_I915_HEARTBEAT_INTERVAL > > int "Interval between heartbeat pulses (ms)" > > default 2500 # milliseconds > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > > index 916c2b80c841..cf1fb09ef766 100644 > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > > @@ -436,6 +436,7 @@ static int ct_write(struct intel_guc_ct *ct, > > */ > > static int wait_for_ct_request_update(struct ct_request *req, u32 *status) > > { > > + long timeout; > > int err; > > > > /* > > @@ -443,10 +444,12 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status) > > * up to that length of time, then switch to a slower sleep-wait loop. > > * No GuC command should ever take longer than 10ms. > > */ > > + timeout = CONFIG_DRM_I915_GUC_CTB_TIMEOUT; > > + > > #define done INTEL_GUC_MSG_IS_RESPONSE(READ_ONCE(req->status)) > > err = wait_for_us(done, 10); > > if (err) > > - err = wait_for(done, 10); > > + err = wait_for(done, timeout); > > #undef done > > > > if (unlikely(err)) > > -- > > 2.28.0 > > > > _______________________________________________ > > Intel-gfx mailing list > > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx