On Tue, May 25, 2021 at 5:05 PM Christian König <ckoenig.leichtzumerken@xxxxxxxxx> wrote: > > Hi Daniel, > > Am 25.05.21 um 15:05 schrieb Daniel Vetter: > > Hi Christian, > > > > On Sat, May 22, 2021 at 10:30:19AM +0200, Christian König wrote: > >> Am 21.05.21 um 20:31 schrieb Daniel Vetter: > >> This works by adding the fence of the last eviction DMA operation to BOs > >> when their backing store is newly allocated. That's what the > >> ttm_bo_add_move_fence() function you stumbled over is good for: https://elixir.bootlin.com/linux/v5.13-rc2/source/drivers/gpu/drm/ttm/ttm_bo.c#L692 > >> > >> Now the problem is it is possible that the application is terminated before > >> it can complete it's command submission. But since resource management only > >> waits for the shared fences when there are some there is a chance that we > >> free up memory while it is still in use. > > Hm where is this code? Would help in my audit that I wanted to do this > > week? If you look at most other places like > > drm_gem_fence_array_add_implicit() I mentioned earlier, then we don't > > treat the shared fences special and always also include the exclusive one. > > See amdgpu_gem_object_close(): > > ... > fence = dma_resv_get_excl(bo->tbo.base.resv); > if (fence) { > amdgpu_bo_fence(bo, fence, true); > fence = NULL; > } > ... > > We explicitly added that because resource management of some other > driver was going totally bananas without that. > > But I'm not sure which one that was. Maybe dig a bit in the git and > mailing history of that. Hm I looked and it's commit 82c416b13cb7d22b96ec0888b296a48dff8a09eb Author: Christian König <christian.koenig@xxxxxxx> Date: Thu Mar 12 12:03:34 2020 +0100 drm/amdgpu: fix and cleanup amdgpu_gem_object_close v4 That sounded more like amdgpu itself needing this, not another driver? But looking at amdgpu_vm_bo_update_mapping() it seems to pick the right fencing mode for gpu pte clearing, so I'm really not sure what the bug was that you worked around here?The implementation boils down to amdgpu_sync_resv() which syncs for the exclusive fence, always. And there's nothing else that I could find in public history at least, no references to bug reports or anything. I think you need to dig internally, because as-is I'm not seeing the problem here. Or am I missing something here? -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch