> -----Original Message----- > From: amd-gfx [mailto:amd-gfx-bounces at lists.freedesktop.org] On Behalf > Of Felix Kuehling > Sent: Thursday, August 11, 2016 3:52 PM > To: Michel Dänzer; Christian König > Cc: amd-gfx at lists.freedesktop.org > Subject: Reverted another change to fix buffer move hangs (was Re: > [PATCH] drm/ttm: partial revert "cleanup ttm_tt_(unbind|destroy)" v2) > > We had to revert another change on the KFD branch to fix a buffer move > problem: 8b6b79f43801f00ddcdc10a4d5719eba4b2e32aa (drm/amdgpu: > group BOs > by log2 of the size on the LRU v2 That makes sense. I think you may want a different LRU scheme for KFD or at least special handling for KFD buffers. Alex > > We haven't looked into this change in detail yet, to understand the > cause. Kent found it by bisecting on amd-staging-4.6 and applying KFD > changes on top. > > Regards, > Felix > > On 16-08-05 11:06 AM, Felix Kuehling wrote: > > For the record, Michel's patch "drm/ttm: Wait for a BO to become idle > > before unbinding it from GTT" fixes our KFD problem as well. > > > > Thanks, > > Felix > > > > On 16-07-27 05:27 PM, Felix Kuehling wrote: > >> We're also looking into a hang with a KFD unit test that allocates lots > >> of memory and fragments it deliberately, without mapping it all at once. > >> It's a new problem for us as we're rebasing on amd-staging-4.6. > >> Something weird seems to be happening with evictions, but I haven't > been > >> able to figure it out. > >> > >> I was able to see that SDMA page table updates stop working at some > >> point, though SDMA fences are still signaling. If I let the test run > >> longer, SDMA and CP hang. I dumped the SDMA IBs and didn't see > anything > >> suspicious. My guess was that maybe the SDMA IBs or the ring are getting > >> corrupted, or maybe the GART table entries for the IBs or ring are > >> corrupted. But I haven't been able to prove that or track it down to a > >> root cause. We're now trying to reimplement the test using libdrm- > amdgpu > >> APIs so we can bisect on the amd-staging-4.6 branch without KFD. > >> > >> Regards, > >> Felix > >> > >> On 16-07-26 10:26 PM, Michel Dänzer wrote: > >>> On 22.07.2016 22:10, Christian König wrote: > >>>> From: Christian König <christian.koenig at amd.com> > >>>> > >>>> We still need to unbind explicitely during a move. > >>> This change fixed a hang for me when running the piglit test > >>> max-texture-size with the radeon driver on Kaveri. > >>> > >>> However, there's still a similar hang left when letting the piglit test > >>> tex3d-maxsize run concurrently with other tests (running tex3d-maxsize > >>> alone doesn't hang, but fails due to running out of GPU memory; that's a > >>> recent radeonsi regression). There are > >>> > >>> [TTM] Buffer eviction failed > >>> > >>> messages in dmesg shortly before the hang. > >>> > >>> I haven't seen such hangs with older kernels. Any ideas offhand what > the > >>> problem could be? If not, I'll try bisecting. > >>> > >>> > > _______________________________________________ > amd-gfx mailing list > amd-gfx at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx