[SNIP]
Let me take a closer look first
I think I've figured out why this isn't working as expected. It started
with this patch here:
commit 5fd8518d187ed03403a4d4f7f56f52c00b11c148
Author: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx>
Date: Mon Dec 6 14:59:35 2021 -0500
drm/amdgpu: Move scheduler init to after XGMI is ready
Before we initialize schedulers we must know which reset
domain are we in - for single device there iis a single
domain per device and so single wq per device. For XGMI
the reset domain spans the entire XGMI hive and so the
reset wq is per hive.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx>
Reviewed-by: Christian König <christian.koenig@xxxxxxx>
Link: https://www.spinics.net/lists/amd-gfx/msg74112.html
Andrey separated the scheduler initialization from the ring init because
we need some of the rings for XGMI initialization which in turn in
necessary to figure out the XGMI hive and so the reset domain for the
scheduler.
The code inside amdgpu_ttm_set_buffer_funcs_status() is actually
correct, the problem is that this is called as part of the hw init which
comes earlier than the scheduler init.
@Alex, Ideas how to fix this? My best guess is that we should move the
call to amdgpu_ttm_set_buffer_funcs_status() from the DMA specific code
into the higher level handling in amdgpu_device.c
Regards,
Christian.