[Bug 107762] [Intel GFX CI] ERROR ring sdma0 timeout, signaled seq=137, emitted seq=137

bugzilla-daemon@xxxxxxxxxxxxxxx · Thu, 06 Sep 2018 15:16:07 +0000

     Michel Dänzer
 changed
          bug 107762

            What
            Removed
            Added

           CC

           ckoenig.leichtzumerken@gmail.com, dev@lynxeye.de

            Comment # 2
              on bug 107762
              from  Michel Dänzer

        (In reply to Martin Peres from comment #0)
> [  358.292609] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=137, emitted seq=137
> [  358.292635] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=145, emitted seq=145

(In reply to Martin Peres from comment #1)
> [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=137, emitted seq=137
> [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=147, emitted seq=147

Hmm, signalled and emitted sequence numbers are always the same, meaning the
hardware hasn't actually timed out?

I can think of two possibilities:

* A GPU scheduler bug causing the job timeout handling to be triggered
spuriously. (Could something be stalling the system work queue, so the items
scheduled by drm_sched_job_finish_cb can't call drm_sched_job_finish in time?)

* A problem with the handling of the GPU's interrupts. Do the numbers on the
amdgpu line in /proc/interrupts still increase after these messages appeared,
or at least in the ten seconds before they appear?

      You are receiving this mail because:

          You are the assignee for the bug.

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[Bug 107762] [Intel GFX CI] *ERROR* ring sdma0 timeout, signaled seq=137, emitted seq=137

[Bug 107762] [Intel GFX CI] ERROR ring sdma0 timeout, signaled seq=137, emitted seq=137