Yeah, we already stumbled over that internally as well. The patch is incorrect, the problem is that we forgot to keep an extra reference on the s_fence to avoid freeing it to early. The correct fix should be on Alex public branch by the end of today. Regards, Christian. Am 17.10.2017 um 20:41 schrieb Darren Salt: > [drm:gfx_v8_0_priv_reg_irq] *ERROR* Illegal register access in command stream > [drm] IP block:gmc_v8_0 is hung! > [drm] IP block:gfx_v8_0 is hung! > > BUG: unable to handle kernel NULL pointer dereference at 00000000000000d8 > IP: amd_sched_hw_job_reset+0x3c/0x9a > PGD 3aedd8067 P4D 3aedd8067 PUD 3aedd9067 PMD 0 > Oops: 0000 [#1] PREEMPT SMP > Modules linked in: cpufreq_conservative bnep bluetooth ecdh_generic serial_ir snd_hrtimer snd_seq_dummy snd_seq_midi snd_rawmidi snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device nct6775 em28xx_rc tda18271 cxd2820r joydev em28xx_dvb usb_storage em28xx tveeprom snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hda_core snd_pcm_oss snd_mixer_oss sp5100_tco sg snd_pcm snd_timer > CPU: 0 PID: 34 Comm: kworker/0:1 Not tainted 4.14.0-rc4+ #3 > Hardware name: System manufacturer System Product Name/A88X-PRO, BIOS 1602 12/04/2014 > Workqueue: events amdgpu_irq_reset_work_func > task: ffff88041caa44c0 task.stack: ffffc90000144000 > RIP: 0010:amd_sched_hw_job_reset+0x3c/0x9a > RSP: 0018:ffffc90000147de8 EFLAGS: 00010293 > RAX: ffff88031adee850 RBX: ffff88031adee800 RCX: 0000000000000001 > RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88040db14c08 > RBP: ffffc90000147e08 R08: ffff88040dac71d0 R09: ffff88040dac71c0 > R10: 0000000000000000 R11: ffff880409575038 R12: ffff88040db14c08 > R13: ffff88040db14bf8 R14: ffff88040db14b50 R15: ffff8803e1575580 > FS: 0000000000000000(0000) GS:ffff88041ec00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00000000000000d8 CR3: 00000003aec3e000 CR4: 00000000000406f0 > Call Trace: > amdgpu_gpu_reset+0x9b/0x55b > ? _raw_spin_unlock_irq+0x12/0x24 > amdgpu_irq_reset_work_func+0x16/0x18 > process_one_work+0x124/0x1db > ? rescuer_thread+0x26a/0x26a > worker_thread+0x19d/0x250 > ? rescuer_thread+0x26a/0x26a > kthread+0xf1/0xf6 > Code: 00 41 54 4c 8d a7 b8 00 00 00 53 4c 89 e7 e8 d9 a9 26 00 49 8b 86 b0 00 00 00 48 8d 58 b0 48 8d 43 50 4c 39 e8 74 51 48 8b 73 10 <48> 8b be d8 00 00 00 48 85 ff 74 37 48 81 c6 c0 00 00 00 e8 6c > RIP: amd_sched_hw_job_reset+0x3c/0x9a RSP: ffffc90000147de8 > CR2: 00000000000000d8 > > Signed-off-by: Darren Salt <devspam at moreofthesa.me.uk> > --- > drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c > index 08e1332d814a..10749c0c0ca0 100644 > --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c > +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c > @@ -427,7 +427,7 @@ void amd_sched_hw_job_reset(struct amd_gpu_scheduler *sched) > > spin_lock(&sched->job_list_lock); > list_for_each_entry_reverse(s_job, &sched->ring_mirror_list, node) { > - if (s_job->s_fence->parent && > + if (s_job->s_fence && s_job->s_fence->parent && > dma_fence_remove_callback(s_job->s_fence->parent, > &s_job->s_fence->cb)) { > dma_fence_put(s_job->s_fence->parent);