Bjørn Mork <bjorn@xxxxxxx> writes: > Hello, > > I've been having occasional GPU HANGs on my Skylake laptop ever since I > got it, originally reported here: > https://bugs.freedesktop.org/show_bug.cgi?id=96894 Several similar bugs have been resolved recently. I apologize for missing this one. I'll update this bug with a request for more information. > But this is not the reason I try this list. The HANGs used to be > resolved nicely by the driver up to and including v4.8. The GPU was > reset and that was that. A noticable hang for a few seconds, and the > usual log messages, but that was it. I could easily live with it. > > v4.9-rcX changed that, making the HANGs a real show stopper problem: The > GPU reset started failing. From the log messges, it looks like the reset > times out and is repeated every 20th second "forever". Something will > give up and kill the X server in the end, resolving the hang with an X > server restart. > > [19308.656674] [drm] GPU HANG: ecode 9:0:0x84dfbffc, in Xorg [1171], reason: Hang on render ring, action: reset > [19308.656769] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. > [19308.656770] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel > [19308.656771] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. > [19308.656772] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. > [19308.656773] [drm] GPU crash dump saved to /sys/class/drm/card0/error > [19308.657131] drm/i915: Resetting chip after gpu hang > [19308.657752] [drm] RC6 on > [19308.677139] [drm] GuC firmware load skipped > [19328.645312] drm/i915: Resetting chip after gpu hang > [19328.649380] [drm] RC6 on > [19328.668497] [drm] GuC firmware load skipped > [19348.612672] drm/i915: Resetting chip after gpu hang > [19348.613017] [drm] RC6 on > [19348.630830] [drm] GuC firmware load skipped > [19364.612475] drm/i915: Resetting chip after gpu hang > [19364.614544] [drm] RC6 on > [19364.629781] [drm] GuC firmware load skipped > [19382.660101] drm/i915: Resetting chip after gpu hang > [19382.660955] [drm] RC6 on > [19382.680661] [drm] GuC firmware load skipped > [19402.628876] drm/i915: Resetting chip after gpu hang > [19402.629229] [drm] RC6 on > [19402.643134] [drm] GuC firmware load skipped > [19422.660054] drm/i915: Resetting chip after gpu hang > [19422.660419] [drm] RC6 on > [19422.675415] [drm] GuC firmware load skipped > [19440.644097] drm/i915: Resetting chip after gpu hang > [19440.644558] [drm] RC6 on > [19440.663878] [drm] GuC firmware load skipped > [19458.627752] drm/i915: Resetting chip after gpu hang > [19458.634024] [drm] RC6 on > [19458.650700] [drm] GuC firmware load skipped > [19478.659877] drm/i915: Resetting chip after gpu hang > [19478.665303] [drm] RC6 on > [19478.684634] [drm] GuC firmware load skipped > [19498.627632] drm/i915: Resetting chip after gpu hang > [19498.634862] [drm] RC6 on > [19498.653638] [drm] GuC firmware load skipped > [19510.659670] drm/i915: Resetting chip after gpu hang > [19510.665894] [drm] RC6 on > [19510.680479] [drm] GuC firmware load skipped > > > Having a multi minute hang followed by losing every running X client is > obviously a lot worse than a simple GPU reset. This makes the i915 > driver after v4.8 unusable to me... > > The earliest v4.9-rc I tested was v4.9-rc5, so that's the earliest > version I know has this issue. The issue is still present in v4.10-rc4. > > I would love to be able to be more precise about when this bug was > introduced, but the triggering HANG issues are just rare enough to make > anything like git bisect impossible. The current frequency is only once > or twice a week. More than enough to make me lose my hair, but far from > often enough for any systematic testing of versions or patches. > > Trying to force a HANG by writing to /sys/kernel/debug/dri/0/i915_wedged > did not have the same effect. This only casued a single reset message > and everything was immediately OK. Possibly because I don't know what > mask to write to write to i915_wedged. Is there any way to figure that > out based on the /sys/class/drm/card0/error from the real hang? Or any > other way to guess it? > > > Please let me know if there is anything I can do to debug this problem > further, or if there are known workarounds. > > > > Bjørn > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/intel-gfx _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx