Hi Greg, On Sat, Oct 22, 2022 at 09:39:26AM +0200, Greg KH wrote: > On Fri, Oct 21, 2022 at 08:14:04AM +0200, Salvatore Bonaccorso wrote: > > Hi, > > > > On Fri, Oct 21, 2022 at 02:29:22AM +0200, Diederik de Haas wrote: > > > On Thursday, 20 October 2022 17:38:56 CEST Alex Deucher wrote: > > > > This reverts commit 9f55f36f749a7608eeef57d7d72991a9bd557341. > > > > > > > > This patch was backported incorrectly when Sasha backported it and > > > > the patch that caused the regression that this patch set fixed > > > > was reverted in commit 412b844143e3 ("Revert "PCI/portdrv: Don't disable AER > > > > reporting in get_port_device_capability()""). This isn't necessary and > > > > causes a regression so drop it. > > > > > > > > Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2216 > > > > Cc: Shuah Khan <skhan@xxxxxxxxxxxxxxxxxxx> > > > > Cc: Sasha Levin <sashal@xxxxxxxxxx> > > > > Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx> > > > > Cc: <stable@xxxxxxxxxxxxxxx> # 5.10 > > > > --- > > > > > > I build a kernel with these 2 patches reverted and can confirm that that fixes > > > the issue on my machine with a Radeon RX Vega 64 GPU. > > > # lspci -nn | grep VGA > > > 0b:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ > > > ATI] Vega 10 XL/XT [Radeon RX Vega 56/64] [1002:687f] (rev c1) > > > > > > So feel free to add > > > > > > Tested-By: Diederik de Haas <didi.debian@xxxxxxxxx> > > > > Note additionally (probably only relevant for Greg while reviewing), > > that the first of the commits which need to be reverted is already > > queued as revert in queue-5.10. > > Yeah, this series does not apply to the current 5.10 queue at all. > > And I am totally confused as to what to do here. > > Can someone please just send me a set of patches, on top of the current > 5.10 stable queue that works? Or just wait for after the next 5.10.y > release next week and then send me a working set of patches if you don't > like to mess with the queue format? The problem is "only" that the first of the commits is already present in the queue, as 1bd9462d17de ("Revert "drm/amdgpu: move nbio sdma_doorbell_range() into sdma code for vega"") but with different commit message (the one from Alex Deucher would have the advantage to have as well reference to the upstream bug at https://gitlab.freedesktop.org/drm/amd/-/issues/2216 . The second commit applies then cleanly on top, so the following inlined here in the message. Regards, Salvatore >From 6a0b925deb55c5a0b5a27cb4a05b73f4663451a8 Mon Sep 17 00:00:00 2001 From: Alex Deucher <alexander.deucher@xxxxxxx> Date: Thu, 20 Oct 2022 11:38:57 -0400 Subject: [PATCH] Revert "drm/amdgpu: make sure to init common IP before gmc" This reverts commit 7b0db849ea030a70b8fb9c9afec67c81f955482e. The patches that this patch depends on were not backported properly and the patch that caused the regression that this patch set fixed was reverted in commit 412b844143e3 ("Revert "PCI/portdrv: Don't disable AER reporting in get_port_device_capability()""). This isn't necessary and causes a regression so drop it. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2216 Cc: Shuah Khan <skhan@xxxxxxxxxxxxxxxxxxx> Cc: Sasha Levin <sashal@xxxxxxxxxx> Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx> Cc: <stable@xxxxxxxxxxxxxxx> # 5.10 --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 14 +++----------- 1 file changed, 3 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 881045e600af..bde0496d2f15 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2179,16 +2179,8 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev) } adev->ip_blocks[i].status.sw = true; - if (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_COMMON) { - /* need to do common hw init early so everything is set up for gmc */ - r = adev->ip_blocks[i].version->funcs->hw_init((void *)adev); - if (r) { - DRM_ERROR("hw_init %d failed %d\n", i, r); - goto init_failed; - } - adev->ip_blocks[i].status.hw = true; - } else if (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_GMC) { - /* need to do gmc hw init early so we can allocate gpu mem */ + /* need to do gmc hw init early so we can allocate gpu mem */ + if (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_GMC) { /* Try to reserve bad pages early */ if (amdgpu_sriov_vf(adev)) amdgpu_virt_exchange_data(adev); @@ -2770,8 +2762,8 @@ static int amdgpu_device_ip_reinit_early_sriov(struct amdgpu_device *adev) int i, r; static enum amd_ip_block_type ip_order[] = { - AMD_IP_BLOCK_TYPE_COMMON, AMD_IP_BLOCK_TYPE_GMC, + AMD_IP_BLOCK_TYPE_COMMON, AMD_IP_BLOCK_TYPE_PSP, AMD_IP_BLOCK_TYPE_IH, }; -- 2.37.2