Can you try the attached patch (with no other patches applied)? I think it should fix the issue. Alex On Sat, Jan 25, 2025 at 1:38 PM Marco Moock <mm@xxxxxxxxxx> wrote: > > Am 24.01.2025 um 16:40:37 Uhr schrieb Alex Deucher: > > > On Fri, Jan 24, 2025 at 9:17 AM Marco Moock <mm@xxxxxxxxxx> wrote: > > > > > > Am 20.01.2025 um 11:35:07 Uhr schrieb Alex Deucher: > > > > > > > On Thu, Jan 16, 2025 at 11:57 AM Marco Moock <mm@xxxxxxxxxx> > > > > wrote: > > > > > > > > > > Am 16.01.2025 um 11:32:42 Uhr schrieb Alex Deucher: > > > > > > > > > > > I'd like to see the driver messages leading up to that. > > > > > > > > > > I've now attached the entire dmesg without the firewall stuff. > > > > > > > > Does the attached test patch help? > > > > > > I've now compiled a kernel with the patch. > > > It doesn't change the freeze problem. > > > > Thanks, > > > > Does setting amdgpu.ppfeaturemask=0xfff73fff on the kernel command > > line in grub help? > > No crash anymore. > > > -- > Gruß > Marco > > Send unsolicited bulk mail to 1737733237muell@xxxxxxxxxxxxxx
From 858d00fcd43366cd4af68cf464f2e26395a7578e Mon Sep 17 00:00:00 2001 From: Alex Deucher <alexander.deucher@xxxxxxx> Date: Tue, 28 Jan 2025 10:02:49 -0500 Subject: [PATCH] drm/amdgpu/gfx9: disallow gfxoff when doing KCQ reset Should fix hangs when resetting KCQs on raven APUs. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3861 Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx> --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index 6aa713cfa2f3e..4fe97f3382a64 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -7302,6 +7302,7 @@ static int gfx_v9_0_reset_kcq(struct amdgpu_ring *ring, if (r) return r; + amdgpu_gfx_off_ctrl(adev, false); /* make sure dequeue is complete*/ amdgpu_gfx_rlc_enter_safe_mode(adev, 0); mutex_lock(&adev->srbm_mutex); @@ -7316,6 +7317,7 @@ static int gfx_v9_0_reset_kcq(struct amdgpu_ring *ring, soc15_grbm_select(adev, 0, 0, 0, 0, 0); mutex_unlock(&adev->srbm_mutex); amdgpu_gfx_rlc_exit_safe_mode(adev, 0); + amdgpu_gfx_off_ctrl(adev, true); if (r) { dev_err(adev->dev, "fail to wait on hqd deactive\n"); return r; -- 2.48.1