On Thu, 18 Mar 2021 at 17:04, Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> wrote: > > From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > A new Kconfig option CONFIG_DRM_I915_REQUEST_TIMEOUT is added, defaulting > to 20s, and this timeout is applied to all users contexts using the > previously added watchdog facility. > > Result of this is that any user submission will simply fail after this > timeout, either causing a reset (for non-preemptable), or incomplete > results. > > This can have an effect that workloads which used to work fine will > suddenly start failing. Even workloads comprised of short batches but in > long dependency chains can be terminated. > > And becuase of lack of agreement on usefulness and safety of fence error because > propagation this partial execution can be invisible to userspace even if > it is "listening" to returned fence status. > > Another interaction is with hangcheck where care needs to be taken timeout > is not set lower or close to three times the heartbeat interval. Otherwise > a hang in any application can cause complete termination of all > submissions from unrelated clients. Any users modifying the per engine > heartbeat intervals therefore need to be aware of this potential denial of > service to avoid inadvertently enabling it. > > Given all this I am personally not convinced the scheme is a good idea. > Intuitively it feels object importers would be better positioned to > enforce the time they are willing to wait for something to complete. > > v2: > * Improved commit message and Kconfig text. > * Pull in some helper code from patch which got dropped. > > v3: > * Bump timeout to 20s to see if it helps Tigerlake. > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > Cc: Daniel Vetter <daniel.vetter@xxxxxxxx> Acked-by: Matthew Auld <matthew.auld@xxxxxxxxx> _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx