Am 08.11.18 um 17:19 schrieb Eric Anholt: > "Koenig, Christian" <Christian.Koenig@xxxxxxx> writes: > >> Am 08.11.18 um 17:04 schrieb Eric Anholt: >>> This reverts commit 0efd2d2f68cd5dbddf4ecd974c33133257d16a8e. Fixes >>> this failure in V3D GPU reset: >>> >>> [ 1418.227796] Unable to handle kernel NULL pointer dereference at virtual address 00000018 >>> [ 1418.235947] pgd = dc4c55ca >>> [ 1418.238695] [00000018] *pgd=80000040004003, *pmd=00000000 >>> [ 1418.244132] Internal error: Oops: 206 [#1] SMP ARM >>> [ 1418.248934] Modules linked in: >>> [ 1418.252001] CPU: 0 PID: 10253 Comm: kworker/0:0 Not tainted 4.19.0-rc6+ #486 >>> [ 1418.259058] Hardware name: Broadcom STB (Flattened Device Tree) >>> [ 1418.265002] Workqueue: events drm_sched_job_timedout >>> [ 1418.269986] PC is at dma_fence_remove_callback+0x8/0x50 >>> [ 1418.275218] LR is at drm_sched_job_timedout+0x4c/0x118 >>> ... >>> [ 1418.415891] [<c086b754>] (dma_fence_remove_callback) from [<c06e7e6c>] (drm_sched_job_timedout+0x4c/0x118) >>> [ 1418.425571] [<c06e7e6c>] (drm_sched_job_timedout) from [<c0242500>] (process_one_work+0x2c8/0x7bc) >>> [ 1418.434552] [<c0242500>] (process_one_work) from [<c0242a38>] (worker_thread+0x44/0x590) >>> [ 1418.442663] [<c0242a38>] (worker_thread) from [<c0249b10>] (kthread+0x160/0x168) >>> [ 1418.450076] [<c0249b10>] (kthread) from [<c02010ac>] (ret_from_fork+0x14/0x28) >>> >>> Cc: Christian König <christian.koenig@xxxxxxx> >>> Cc: Nayan Deshmukh <nayan26deshmukh@xxxxxxxxx> >>> Cc: Alex Deucher <alexander.deucher@xxxxxxx> >>> Signed-off-by: Eric Anholt <eric@xxxxxxxxxx> >> Well NAK. The problem here is that fence->parent is NULL which is most >> likely caused by an issue somewhere else. >> >> We could easily work around that with an extra NULL check, but reverting >> the patch would break GPU recovery again. > My GPU recovery works with the revert and reliably doesn't work without > it, so my idea of "break GPU recovery" is the opposite of yours. Can > you help figure out what in this change broke my driver? The problem is here: > - list_for_each_entry_reverse(job, &sched->ring_mirror_list, node) { > - struct drm_sched_fence *fence = job->s_fence; > - > - if (!dma_fence_remove_callback(fence->parent, &fence->cb)) > - goto already_signaled; dma_fence_remove_callback() will fault if fence->parent is NULL. A simple "if (!fence->parent) continue;" should be enough to work around that. But I'm not sure how exactly fence->parent became NULL in the first place. Going to double check the code once more, Christian. _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel