On 2023-10-24 10:46, Alex Deucher wrote: > On Tue, Oct 24, 2023 at 6:14 AM Christian König > <ckoenig.leichtzumerken@xxxxxxxxx> wrote: >> >> [SNIP] >>> Let me take a closer look first >> >> I think I've figured out why this isn't working as expected. It started >> with this patch here: >> >> commit 5fd8518d187ed03403a4d4f7f56f52c00b11c148 >> Author: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx> >> Date: Mon Dec 6 14:59:35 2021 -0500 >> >> drm/amdgpu: Move scheduler init to after XGMI is ready >> >> Before we initialize schedulers we must know which reset >> domain are we in - for single device there iis a single >> domain per device and so single wq per device. For XGMI >> the reset domain spans the entire XGMI hive and so the >> reset wq is per hive. >> >> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx> >> Reviewed-by: Christian König <christian.koenig@xxxxxxx> >> Link: https://www.spinics.net/lists/amd-gfx/msg74112.html >> >> Andrey separated the scheduler initialization from the ring init because >> we need some of the rings for XGMI initialization which in turn in >> necessary to figure out the XGMI hive and so the reset domain for the >> scheduler. >> >> The code inside amdgpu_ttm_set_buffer_funcs_status() is actually >> correct, the problem is that this is called as part of the hw init which >> comes earlier than the scheduler init. >> >> @Alex, Ideas how to fix this? My best guess is that we should move the >> call to amdgpu_ttm_set_buffer_funcs_status() from the DMA specific code >> into the higher level handling in amdgpu_device.c > > Yes, I think so, but there could be some tricky ordering issues with > respect to suspend and resume. I think something like the attached > patch should do the trick. This patch works. I've tested suspend and resume too. Tested-by: Luben Tuikov <luben.tuikov@xxxxxxx> scripts/checkpatch.pl complains about extra parenthesis. -- Regards, Luben