Re: v6.11-rc4 amdgpu regression from v6.10.0

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ok - hacked out a patch that allows 6.11-rc4 to boot with out hanging
- just disabling the "mes" stuff.

See attached patch

Yeah !

Andrew


On Tue, 20 Aug 2024 at 00:13, Alex Deucher <alexdeucher@xxxxxxxxx> wrote:
>
> On Mon, Aug 19, 2024 at 9:55 AM Andrew Worsley <amworsley@xxxxxxxxx> wrote:
> >
> > The v6.11-rc4 linux hangs during amdgpu start up where as the v6.10.0
> > is fine. I had to take a photo of the screen (see attachment) from
> > which I generated
> > the following summary:
> >
> >     Booting linux v6.11-rc4 :
> > ...
> > amdgpu: Virtual CRAT table created for CPU
> > amdgpu: Topology: Add CPU node
> > initializing kernel modesetting (IP DISCOVERY 0x1002:0x15BF 0xF111:0x0005 0xC2).
> > register mmio base: 0x90500000
> > register mmio size: 524288
> > add ip block number 0 <soc21_common>
> > add ip block number 1 <gmc_v11_0>
> > add ip block number 2 <ih_v6_0>
> > add ip block number 3 <psp>
> > add ip block number 4 <smu>
> > add ip block number 5 <dm>
> > add ip block number 6 <gfx_v11_0>
> > add ip block number 7 <sdma_v6_0>
> > add ip block number 8 <vcn_v4_0>
> > add ip block number 9 <jpeg_v4_0>
> > add ip block number 10 <mes_v11_0>
> > amdgpu 0000:c1:00.0: amdgpu: Fetched VBIOS from VFCT
> > amdgpu: ATOM BIOS: 113-PHXGENERIC-001
> > amdgpu 0000:c1:00.0: Direct firmware load for
> > amdgpu/gc_11_0_1_mes_2.bin failed with error -2
> > amdgpu 0000:c1:00.0: amdgpu: try to fall back to amdgpu/gc_11_0_1_mes.bin
....
From 535c5a73b945615bd1ea90db1d6d331fa9677252 Mon Sep 17 00:00:00 2001
From: Andrew Worsley <amworsley@xxxxxxxxx>
Date: Tue, 20 Aug 2024 16:37:36 +1000
Subject: [PATCH] Fix amdgpu hang on boot by reverting
 f9d8c5c7855d8f3e4c3e678777d02a49046eafb0.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Revert "drm/amdgpu/gfx: enable mes to map legacy queue support"
Disable the mes stuff - now doesn't hang on my AMD Ryzen™ 7040 Series framework 16inch  laptop
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 44 ++-----------------------
 1 file changed, 2 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index c770cb201e64..f2fe7874c6da 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -509,16 +509,6 @@ int amdgpu_gfx_disable_kcq(struct amdgpu_device *adev, int xcc_id)
 	int i, r = 0;
 	int j;
 
-	if (adev->enable_mes) {
-		for (i = 0; i < adev->gfx.num_compute_rings; i++) {
-			j = i + xcc_id * adev->gfx.num_compute_rings;
-			amdgpu_mes_unmap_legacy_queue(adev,
-						   &adev->gfx.compute_ring[j],
-						   RESET_QUEUES, 0, 0);
-		}
-		return 0;
-	}
-
 	if (!kiq->pmf || !kiq->pmf->kiq_unmap_queues)
 		return -EINVAL;
 
@@ -561,18 +551,6 @@ int amdgpu_gfx_disable_kgq(struct amdgpu_device *adev, int xcc_id)
 	int i, r = 0;
 	int j;
 
-	if (adev->enable_mes) {
-		if (amdgpu_gfx_is_master_xcc(adev, xcc_id)) {
-			for (i = 0; i < adev->gfx.num_gfx_rings; i++) {
-				j = i + xcc_id * adev->gfx.num_gfx_rings;
-				amdgpu_mes_unmap_legacy_queue(adev,
-						      &adev->gfx.gfx_ring[j],
-						      PREEMPT_QUEUES, 0, 0);
-			}
-		}
-		return 0;
-	}
-
 	if (!kiq->pmf || !kiq->pmf->kiq_unmap_queues)
 		return -EINVAL;
 
@@ -657,9 +635,6 @@ int amdgpu_gfx_enable_kcq(struct amdgpu_device *adev, int xcc_id)
 	uint64_t queue_mask = 0;
 	int r, i, j;
 
-	if (adev->enable_mes)
-		return amdgpu_gfx_mes_enable_kcq(adev, xcc_id);
-
 	if (!kiq->pmf || !kiq->pmf->kiq_map_queues || !kiq->pmf->kiq_set_resources)
 		return -EINVAL;
 
@@ -678,10 +653,9 @@ int amdgpu_gfx_enable_kcq(struct amdgpu_device *adev, int xcc_id)
 		queue_mask |= (1ull << amdgpu_queue_mask_bit_to_set_resource_bit(adev, i));
 	}
 
-	amdgpu_device_flush_hdp(adev, NULL);
-
 	DRM_INFO("kiq ring mec %d pipe %d q %d\n", kiq_ring->me, kiq_ring->pipe,
-		 kiq_ring->queue);
+							kiq_ring->queue);
+	amdgpu_device_flush_hdp(adev, NULL);
 
 	spin_lock(&kiq->ring_lock);
 	r = amdgpu_ring_alloc(kiq_ring, kiq->pmf->map_queues_size *
@@ -719,20 +693,6 @@ int amdgpu_gfx_enable_kgq(struct amdgpu_device *adev, int xcc_id)
 
 	amdgpu_device_flush_hdp(adev, NULL);
 
-	if (adev->enable_mes) {
-		for (i = 0; i < adev->gfx.num_gfx_rings; i++) {
-			j = i + xcc_id * adev->gfx.num_gfx_rings;
-			r = amdgpu_mes_map_legacy_queue(adev,
-							&adev->gfx.gfx_ring[j]);
-			if (r) {
-				DRM_ERROR("failed to map gfx queue\n");
-				return r;
-			}
-		}
-
-		return 0;
-	}
-
 	spin_lock(&kiq->ring_lock);
 	/* No need to map kcq on the slave */
 	if (amdgpu_gfx_is_master_xcc(adev, xcc_id)) {
-- 
2.39.2


[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux