Re: amdgpu 100% CPU usage causing freeze 1002:15d8

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Can you try the attached patch (with no other patches applied)?  I
think it should fix the issue.

Alex

On Sat, Jan 25, 2025 at 1:38 PM Marco Moock <mm@xxxxxxxxxx> wrote:
>
> Am 24.01.2025 um 16:40:37 Uhr schrieb Alex Deucher:
>
> > On Fri, Jan 24, 2025 at 9:17 AM Marco Moock <mm@xxxxxxxxxx> wrote:
> > >
> > > Am 20.01.2025 um 11:35:07 Uhr schrieb Alex Deucher:
> > >
> > > > On Thu, Jan 16, 2025 at 11:57 AM Marco Moock <mm@xxxxxxxxxx>
> > > > wrote:
> > > > >
> > > > > Am 16.01.2025 um 11:32:42 Uhr schrieb Alex Deucher:
> > > > >
> > > > > > I'd like to see the driver messages leading up to that.
> > > > >
> > > > > I've now attached the entire dmesg without the firewall stuff.
> > > >
> > > > Does the attached test patch help?
> > >
> > > I've now compiled a kernel with the patch.
> > > It doesn't change the freeze problem.
> >
> > Thanks,
> >
> > Does setting amdgpu.ppfeaturemask=0xfff73fff on the kernel command
> > line in grub help?
>
> No crash anymore.
>
>
> --
> Gruß
> Marco
>
> Send unsolicited bulk mail to 1737733237muell@xxxxxxxxxxxxxx
From 858d00fcd43366cd4af68cf464f2e26395a7578e Mon Sep 17 00:00:00 2001
From: Alex Deucher <alexander.deucher@xxxxxxx>
Date: Tue, 28 Jan 2025 10:02:49 -0500
Subject: [PATCH] drm/amdgpu/gfx9: disallow gfxoff when doing KCQ reset

Should fix hangs when resetting KCQs on raven APUs.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3861
Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 6aa713cfa2f3e..4fe97f3382a64 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -7302,6 +7302,7 @@ static int gfx_v9_0_reset_kcq(struct amdgpu_ring *ring,
 	if (r)
 		return r;
 
+	amdgpu_gfx_off_ctrl(adev, false);
 	/* make sure dequeue is complete*/
 	amdgpu_gfx_rlc_enter_safe_mode(adev, 0);
 	mutex_lock(&adev->srbm_mutex);
@@ -7316,6 +7317,7 @@ static int gfx_v9_0_reset_kcq(struct amdgpu_ring *ring,
 	soc15_grbm_select(adev, 0, 0, 0, 0, 0);
 	mutex_unlock(&adev->srbm_mutex);
 	amdgpu_gfx_rlc_exit_safe_mode(adev, 0);
+	amdgpu_gfx_off_ctrl(adev, true);
 	if (r) {
 		dev_err(adev->dev, "fail to wait on hqd deactive\n");
 		return r;
-- 
2.48.1


[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux