Re: Regression in i915 for 4.11-rc1 - bisected to commit 69df05e11ab8

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03/23/2017 03:44 PM, Chris Wilson wrote:
On Thu, Mar 23, 2017 at 01:19:43PM -0500, Larry Finger wrote:
Since kernel 4.11-rc1, my desktop (Plasma5/KDE) has encountered
intermittent hangs with the following information in the logs:

linux-4v1g.suse kernel: [drm] GPU HANG: ecode 7:0:0xf3cffffe, in
plasmashell [1283], reason: Hang on render ring, action: reset
linux-4v1g.suse kernel: [drm] GPU hangs can indicate a bug anywhere
in the entire gfx stack, including userspace.
linux-4v1g.suse kernel: [drm] Please file a _new_ bug report on
bugs.freedesktop.org against DRI -> DRM/Intel
linux-4v1g.suse kernel: [drm] drm/i915 developers can then reassign
to the right component if it's not a kernel issue.
linux-4v1g.suse kernel: [drm] The gpu crash dump is required to
analyze gpu hangs, so please always attach it.
linux-4v1g.suse kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
linux-4v1g.suse kernel: drm/i915: Resetting chip after gpu hang

This problem was added to
https://bugs.freedesktop.org/show_bug.cgi?id=99380, but it probably
is a different bug, as the OP in that report has problems with
kernel 4.10.x, whereas my problem did not appear until 4.11.

Close. Actually that patch touches code you are not using (oa-perf and
gvt), the real culprit was e8a9c58fcd9a ("drm/i915: Unify active context
tracking between legacy/execlists/guc").

The fix

commit 5d4bac5503fcc67dd7999571e243cee49371aef7
Author: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Date:   Wed Mar 22 20:59:30 2017 +0000

    drm/i915: Restore marking context objects as dirty on pinning

    Commit e8a9c58fcd9a ("drm/i915: Unify active context tracking between
    legacy/execlists/guc") converted the legacy intel_ringbuffer submission
    to the same context pinning mechanism as execlists - that is to pin the
    context until the subsequent request is retired. Previously it used the
    vma retirement of the context object to keep itself pinned until the
    next request (after i915_vma_move_to_active()). In the conversion, I
    missed that the vma retirement was also responsible for marking the
    object as dirty. Mark the context object as dirty when pinning
    (equivalent to execlists) which ensures that if the context is swapped
    out due to mempressure or suspend/hibernation, when it is loaded back in
    it does so with the previous state (and not all zero).

    Fixes: e8a9c58fcd9a ("drm/i915: Unify active context tracking between legacy/execlists/guc")
    Reported-by: Dennis Gilmore <dennis@xxxxxxxx>
    Reported-by: Mathieu Marquer <mathieu.marquer@xxxxxxxxx>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99993
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100181
    Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
    Cc: <drm-intel-fixes@xxxxxxxxxxxxxxxxxxxxx> # v4.11-rc1
    Link: http://patchwork.freedesktop.org/patch/msgid/20170322205930.12762-1-chris@xxxxxxxxxxxxxxxxxx
    Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>

went in this morning and so will be upstreamed ~next week.
-Chris

Thanks. With a bug that is difficult to trigger, bisection is difficult. I am surprised that the only step I got wrong was the last one. BTW, my reversion failed after 20 hours. I was ready to write again when I got your fix. Good timing.

If your patch does not fix my problem, I will let you know.

Larry


_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux