Am 07.09.22 um 11:21 schrieb Matthew Auld:
On Tue, 6 Sept 2022 at 09:54, Christian König <christian.koenig@xxxxxxx> wrote:
Am 06.09.22 um 10:46 schrieb ZhenGuo Yin:
[Why]
Ghost BO is released with non-empty bulk move object. There is a
warning trace:
WARNING: CPU: 19 PID: 1582 at ttm/ttm_bo.c:366 ttm_bo_release+0x2e1/0x2f0 [amdttm]
Call Trace:
amddma_resv_reserve_fences+0x10d/0x1f0 [amdkcl]
amdttm_bo_put+0x28/0x30 [amdttm]
amdttm_bo_move_accel_cleanup+0x126/0x200 [amdttm]
amdgpu_bo_move+0x1a8/0x770 [amdgpu]
ttm_bo_handle_move_mem+0xb0/0x140 [amdttm]
amdttm_bo_validate+0xbf/0x100 [amdttm]
[How]
The resource of ghost BO should be moved to LRU directly, instead of
using bulk move. The bulk move object of ghost BO should set to NULL
before function ttm_bo_move_to_lru_tail_unlocked.
v2: set bulk move to NULL manually if no resource associated with ghost BO
Fixed: 5b951e487fd6bf5f ("drm/ttm: fix bulk move handling v2")
Signed-off-by: ZhenGuo Yin <zhenguo.yin@xxxxxxx>
Reviewed-by: Christian König <christian.koenig@xxxxxxx>
Going to push that to drm-misc-fixes in a minute.
Thanks,
Christian.
---
drivers/gpu/drm/ttm/ttm_bo_util.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
index 1cbfb00c1d65..57a27847206f 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -239,6 +239,9 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo,
if (fbo->base.resource) {
ttm_resource_set_bo(fbo->base.resource, &fbo->base);
bo->resource = NULL;
+ ttm_bo_set_bulk_move(&fbo->base, NULL);
This appears to blow up quite badly in i915. See here for an example trace:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Fintel%2F-%2Fissues%2F6744&data=05%7C01%7Cchristian.koenig%40amd.com%7C2020e04c603d4641d05308da90b25e1a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637981393013966600%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=iY%2FNdAihQFpOhgP0gcLCpYtStUd1XS%2BPP46DFVPQhSk%3D&reserved=0
Do you know if amdgpu is also hitting this, or is this somehow i915 specific?
At least a quick test on amdgpu worked fine, but that was without
lockdep enabled.
I think I see the problem. The move of the resource and removal of the
bulk_move must come after the dma_resv_trylock() or otherwise the
dma_resv object isn't locked.
Going to provide a patch.
Christian.
+ } else {
+ fbo->base.bulk_move = NULL;
}
dma_resv_init(&fbo->base.base._resv);