Quoting Joonas Lahtinen (2017-12-04 13:41:11) > On Wed, 2017-11-29 at 14:05 +0000, Chris Wilson wrote: > > History tells us that if we cannot reset the GPU now, we never will. This > > then impacts everything that is run subsequently. On failing the reset, > > we mark the driver as wedged, trying to prevent further execution on the > > GPU, forcing userspace to fallback to using the CPU to update its > > framebuffers and let the user know what happened. > > > > We also want to go one step further and add a taint to the kernel so that > > any subsequent faults can be traced back to this failure. This is > > important for igt, where if the GPU/driver fails we want to reboot and > > restart testing rather than continue on into oblivion. > > > > TAINT_DIE is colloquially known as "system on fire", which seems > > appropriate for unresponsive hardware. > > > > v2: Also taint if the recovery fails (again history shows us that is > > typically fatal). > > > > References: https://bugs.freedesktop.org/show_bug.cgi?id=103514 > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> > > Cc: Daniel Vetter <daniel.vetter@xxxxxxxx> > > Cc: Michał Winiarski <michal.winiarski@xxxxxxxxx> > > <SNIP> > > > @@ -1951,6 +1954,19 @@ void i915_reset(struct drm_i915_private *i915, unsigned int flags) > > wake_up_bit(&error->flags, I915_RESET_HANDOFF); > > return; > > > > +taint: > > + /* > > + * History tells us that if we cannot reset the GPU now, we > > + * never will. This then impacts everything that is run > > + * subsequently. On failing the reset, we mark the driver > > + * as wedged, preventing further execution on the GPU. > > + * We also want to go one step further and add a taint to the > > + * kernel so that any subsequent faults can be traced back to > > + * this failure. This is important for igt, where if the > > + * GPU/driver fails we want to reboot and restart testing > > + * rather than continue on into oblivion. > > + */ > > As Marta mentioned too, How igt works on a given day is bit volatile to > document in the kernel comments. More to the point, CI now implements the described response to TAINT_DIE, without which this is pointless (userspace sees the wedged and either handles it or dies; CI sees the wedged as a challenge). -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx