Hi Alex, On 1/6/2024 12:11 AM, Alex Deucher wrote: > On Fri, Jan 5, 2024 at 9:16 AM Christian König > <ckoenig.leichtzumerken@xxxxxxxxx> wrote: >> >> Am 21.12.23 um 02:58 schrieb Ma, Jun: >>> Hi Christian, >>> >>> >>> On 12/20/2023 10:10 PM, Christian König wrote: >>>> Am 19.12.23 um 06:58 schrieb Ma Jun: >>>>> Print a warnning message if the system can't access >>>>> the resize bar register when using large bar. >>>> Well pretty clear NAK, we have embedded use cases where this would >>>> trigger an incorrect warning. >>>> >>>> What should that be good for in the first place? >>>> >>> Some customer platforms do not enable mmconfig for various reasons, such as >>> bios bug, and therefore cannot access the GPU extend configuration >>> space through mmio. >>> >>> Therefore, when the system enters the d3cold state and resumes, >>> the amdgpu driver fails to resume because the extend configuration >>> space registers of GPU can't be restored. At this point, Usually we >>> only see some failure dmesg log printed by amdgpu driver, it is >>> difficult to find the root cause. >>> >>> So I thought it would be helpful to print some warning messages at >>> the beginning to identify problems quickly. >> >> Interesting bug, but we can't do this here. We have a couple of devices >> where the REBAR cap isn't enabled for some reason (or not correctly >> enabled). >> >> In this case this would print a warning even when there isn't anything >> wrong. >> >> What we could potentially do is to check for the MSI extension, that >> should always be there if I'm not completely mistaken. >> >> But how does this hardware platform even works without the extended mmio >> space? I mean we can't even enable/disable MSI in that configuration if >> I'm not completely mistaken. > > That system is probably similar to what Mario mentioned: > https://lore.kernel.org/linux-pci/20231215220343.22523-1-mario.limonciello@xxxxxxx/ > Yes, It's the same problem. Regards, Ma Jun > Alex > >> >> Regards, >> Christian. >> >>> >>> Regards, >>> Ma Jun >>> >>>> Regards, >>>> Christian. >>>> >>>>> Signed-off-by: Ma Jun <Jun.Ma2@xxxxxxx> >>>>> --- >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 +++++++++- >>>>> 1 file changed, 9 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>> index 4b694696930e..e7aedb4bd66e 100644 >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>> @@ -1417,6 +1417,12 @@ void amdgpu_device_wb_free(struct amdgpu_device *adev, u32 wb) >>>>> __clear_bit(wb, adev->wb.used); >>>>> } >>>>> >>>>> +static inline void amdgpu_find_rb_register(struct amdgpu_device *adev) >>>>> +{ >>>>> + if (!pci_find_ext_capability(adev->pdev, PCI_EXT_CAP_ID_REBAR)) >>>>> + DRM_WARN("System can't access the resize bar register,please check!!\n"); >>>>> +} >>>>> + >>>>> /** >>>>> * amdgpu_device_resize_fb_bar - try to resize FB BAR >>>>> * >>>>> @@ -1444,8 +1450,10 @@ int amdgpu_device_resize_fb_bar(struct amdgpu_device *adev) >>>>> >>>>> /* skip if the bios has already enabled large BAR */ >>>>> if (adev->gmc.real_vram_size && >>>>> - (pci_resource_len(adev->pdev, 0) >= adev->gmc.real_vram_size)) >>>>> + (pci_resource_len(adev->pdev, 0) >= adev->gmc.real_vram_size)) { >>>>> + amdgpu_find_rb_register(adev); >>>>> return 0; >>>>> + } >>>>> >>>>> /* Check if the root BUS has 64bit memory resources */ >>>>> root = adev->pdev->bus; >>