We're also looking into a hang with a KFD unit test that allocates lots of memory and fragments it deliberately, without mapping it all at once. It's a new problem for us as we're rebasing on amd-staging-4.6. Something weird seems to be happening with evictions, but I haven't been able to figure it out. I was able to see that SDMA page table updates stop working at some point, though SDMA fences are still signaling. If I let the test run longer, SDMA and CP hang. I dumped the SDMA IBs and didn't see anything suspicious. My guess was that maybe the SDMA IBs or the ring are getting corrupted, or maybe the GART table entries for the IBs or ring are corrupted. But I haven't been able to prove that or track it down to a root cause. We're now trying to reimplement the test using libdrm-amdgpu APIs so we can bisect on the amd-staging-4.6 branch without KFD. Regards, Felix On 16-07-26 10:26 PM, Michel Dänzer wrote: > On 22.07.2016 22:10, Christian König wrote: >> From: Christian König <christian.koenig at amd.com> >> >> We still need to unbind explicitely during a move. > This change fixed a hang for me when running the piglit test > max-texture-size with the radeon driver on Kaveri. > > However, there's still a similar hang left when letting the piglit test > tex3d-maxsize run concurrently with other tests (running tex3d-maxsize > alone doesn't hang, but fails due to running out of GPU memory; that's a > recent radeonsi regression). There are > > [TTM] Buffer eviction failed > > messages in dmesg shortly before the hang. > > I haven't seen such hangs with older kernels. Any ideas offhand what the > problem could be? If not, I'll try bisecting. > >