On 2017-12-13 08:44 PM, Andrey Grodzovsky wrote: > With introduction of amdgpu_gpu_recovery we don't need any more > to rely on amdgpu_lockup_timeout == 0 for disabling GPU reset. > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com> Since this change landed, I'm once again unable to finish a piglit run on my development machine, see the attached dmesg output (happens pretty quickly, after ~5% of piglit tests have run). I realized that with lockup_timeout != 0, the WARN_ON_ONCE(bo->tbo.mem.mem_type == TTM_PL_SYSTEM); at the top of amdgpu_bo_gpu_offset has been triggering since the 4.15 development cycle. See the bisection result below. Note that I'm not 100% sure this is the correct guilty commit, since it's probably been the most painful bisection I've ever done so far (14 skips, had to revert 4 commits causing other regressions). But I'm quite sure this regression happened in the 84d43463a2d09c28c9222fbb7d1082c078e2523a..3f3333f8a0e90ac26f84ed7b0aa344efce695c08 range. 3f3333f8a0e90ac26f84ed7b0aa344efce695c08 is the first bad commit commit 3f3333f8a0e90ac26f84ed7b0aa344efce695c08 Author: Christian König <christian.koenig at amd.com> Date: Thu Aug 3 14:02:13 2017 +0200 drm/amdgpu: track evicted page tables v2 Instead of validating all page tables when one was evicted, track which one needs a validation. v2: simplify amdgpu_vm_ready as well Signed-off-by: Christian König <christian.koenig at amd.com> Reviewed-by: Alex Deucher <alexander.deucher at amd.com> (v1) Reviewed-by: Chunming Zhou <david1.zhou at amd.com> -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer -------------- next part -------------- A non-text attachment was scrubbed... Name: amdgpu_bo_gpu_offset-WARN.diff Type: text/x-patch Size: 28301 bytes Desc: not available URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20171219/426d4056/attachment-0001.bin>