[AMD Public Use]
That's the IP discovery table. The Object header is part of atombios. Do you have an vbios image from that system?
Alex
From: Yin, Tianci (Rico) <Tianci.Yin@xxxxxxx>
Sent: Sunday, November 1, 2020 10:46 PM
To: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Zhang, Hawking <Hawking.Zhang@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>
Cc: Tuikov, Luben <Luben.Tuikov@xxxxxxx>; Chen, Guchun <Guchun.Chen@xxxxxxx>; Cui, Flora <Flora.Cui@xxxxxxx>; Xu, Feifei <Feifei.Xu@xxxxxxx>; Long, Gang <Gang.Long@xxxxxxx>
Subject: Re: [PATCH] drm/amdgpu: fix NULL pointer crash on navi10 headless SKU
[AMD Public Use]
Thanks very much Hawking!
Hi Alex,
The amdgpu_device_ip_get_ip_block() depends on the adev->ip_blocks that initialized by nv_set_ip_blocks().
In nv_set_ip_blocks(), whether add dm_ip_block depends on amdgpu_device_has_dc_support().
And amdgpu_device_has_dc_support() depends on amdgpu_device_asic_has_dc_support(),
So amdgpu_device_asic_has_dc_support() can't conditional on amdgpu_device_ip_get_ip_block(), it makes a dependency loop.
I have just checked the atombios object table in the headless VBIOS, and find DCN and VCN are still there.
[ 174.815714] [drm:amdgpu_discovery_reg_base_init [amdgpu]] DMU(271) #0 v2.0.2:
[ 174.815771] [drm:amdgpu_discovery_reg_base_init [amdgpu]] 0x00000012
[ 174.815829] [drm:amdgpu_discovery_reg_base_init [amdgpu]] 0x000000c0
[ 174.815887] [drm:amdgpu_discovery_reg_base_init [amdgpu]] 0x000034c0
[ 174.815945] [drm:amdgpu_discovery_reg_base_init [amdgpu]] 0x00009000
[ 174.816002] [drm:amdgpu_discovery_reg_base_init [amdgpu]] 0x02403c00
[ 174.821854] [drm:amdgpu_discovery_reg_base_init [amdgpu]] UVD(12) #0 v2.0.0:
[ 174.821908] [drm:amdgpu_discovery_reg_base_init [amdgpu]] 0x00007800
[ 174.821962] [drm:amdgpu_discovery_reg_base_init [amdgpu]] 0x00007e00
[ 174.822017] [drm:amdgpu_discovery_reg_base_init [amdgpu]] 0x02403000
So I think DAL driver can't tell it's a normal ASIC or headless ASIC.
Thanks!
Rico
From: Deucher, Alexander <Alexander.Deucher@xxxxxxx>
Sent: Friday, October 30, 2020 22:18
To: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Yin, Tianci (Rico) <Tianci.Yin@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>
Cc: Tuikov, Luben <Luben.Tuikov@xxxxxxx>; Chen, Guchun <Guchun.Chen@xxxxxxx>; Cui, Flora <Flora.Cui@xxxxxxx>; Xu, Feifei <Feifei.Xu@xxxxxxx>; Long, Gang <Gang.Long@xxxxxxx>
Subject: Re: [PATCH] drm/amdgpu: fix NULL pointer crash on navi10 headless SKU
[AMD Public Use]
Maybe it would be easier to check if the DCE IP exists? E.g.,
if (!amdgpu_device_ip_get_ip_block(adev, AMD_IP_BLOCK_TYPE_DCE) ||
(!amdgpu_device_has_dc_support(adev))
...
Also do we even need to skip DCN init for these cards, or will the display code handle it automatically since there will be no displays in the atombios object table. We didn't do anything special for display with the polaris blockchain cards.
Alex
From: Zhang, Hawking <Hawking.Zhang@xxxxxxx>
Sent: Friday, October 30, 2020 9:00 AM
To: Yin, Tianci (Rico) <Tianci.Yin@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>
Cc: Tuikov, Luben <Luben.Tuikov@xxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Chen, Guchun <Guchun.Chen@xxxxxxx>; Cui, Flora <Flora.Cui@xxxxxxx>; Xu, Feifei <Feifei.Xu@xxxxxxx>; Long, Gang <Gang.Long@xxxxxxx>
Subject: RE: [PATCH] drm/amdgpu: fix NULL pointer crash on navi10 headless SKU
[AMD Public Use]
I see, thanks for the clarifying. The patch is
Reviewed-by: Hawking Zhang <Hawking.Zhang@xxxxxxx>
I was thinking to have a new flag like AMD_IS_HEADLESS in amd_chip_flag, but after think it a bit more, that’s just complicated the case, I agree with your current approach.
Regards,
Hawking
From: Yin, Tianci (Rico) <Tianci.Yin@xxxxxxx>
Sent: Friday, October 30, 2020 20:05
To: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Tuikov, Luben <Luben.Tuikov@xxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Chen, Guchun <Guchun.Chen@xxxxxxx>; Cui, Flora <Flora.Cui@xxxxxxx>; Xu, Feifei <Feifei.Xu@xxxxxxx>; Long, Gang <Gang.Long@xxxxxxx>
Subject: Re: [PATCH] drm/amdgpu: fix NULL pointer crash on navi10 headless SKU
[AMD Public Use]
amdgpu_device_asic_has_dc_support() is referrenced by amdgpu_pci_probe(),
at that point, adev has not been allocated yet.
My change can make it to right code path.
int amdgpu_device_resume(struct drm_device *dev, bool fbcon)
{
if (!amdgpu_device_has_dc_support(adev))
drm_helper_hpd_irq_event(dev); //right path for headless SKU
drm_kms_helper_hotplug_event(dev); //wrong path for headless SKU
From: Zhang, Hawking <Hawking.Zhang@xxxxxxx>
Sent: Friday, October 30, 2020 19:48
To: Yin, Tianci (Rico) <Tianci.Yin@xxxxxxx>;
amd-gfx@xxxxxxxxxxxxxxxxxxxxx <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>
Cc: Tuikov, Luben <Luben.Tuikov@xxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Chen, Guchun <Guchun.Chen@xxxxxxx>;
Cui, Flora <Flora.Cui@xxxxxxx>; Xu, Feifei <Feifei.Xu@xxxxxxx>; Long, Gang <Gang.Long@xxxxxxx>; Yin, Tianci (Rico) <Tianci.Yin@xxxxxxx>
Subject: RE: [PATCH] drm/amdgpu: fix NULL pointer crash on navi10 headless SKU
[AMD Public Use]
I'm not sure I get your point on changing amdgpu_device_has_dc_support() interface by adding new parameter. but it seems to me change input parameter from pdev to adev for nv_is_headless_sku is more straightforward.
Regards,
Hawking
-----Original Message-----
From: Tianci Yin <tianci.yin@xxxxxxx>
Sent: Friday, October 30, 2020 19:32
To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Tuikov, Luben <Luben.Tuikov@xxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Zhang, Hawking <Hawking.Zhang@xxxxxxx>;
Chen, Guchun <Guchun.Chen@xxxxxxx>; Cui, Flora <Flora.Cui@xxxxxxx>; Xu, Feifei <Feifei.Xu@xxxxxxx>; Long, Gang <Gang.Long@xxxxxxx>;
Yin, Tianci (Rico) <Tianci.Yin@xxxxxxx>
Subject: [PATCH] drm/amdgpu: fix NULL pointer crash on navi10 headless SKU
From: "Tianci.Yin" <tianci.yin@xxxxxxx>
The crash caused by the NULL pointer of
adev->ddev.mode_config.funcs in drm_kms_helper_hotplug_event(),
but this function should not be called on headless SKU.
Fix the mismatch between the return value of
amdgpu_device_has_dc_support() and the real DCN supporting state to avoid calling to drm_kms_helper_hotplug_event() in amdgpu_device_resume().
Change-Id: I3a3d387e6ab5b774abb3911ea1bf6de60797759d
Signed-off-by: Tianci.Yin <tianci.yin@xxxxxxx>
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 +-
drivers/gpu/drm/amd/amdgpu/nv.c | 2 +-
drivers/gpu/drm/amd/amdgpu/nv.h | 1 +
6 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index ba65d4f2ab67..f0183271456f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1090,7 +1090,7 @@ void amdgpu_device_indirect_wreg64(struct amdgpu_device *adev,
u32 pcie_index, u32 pcie_data,
u32 reg_addr, u64 reg_data);
-bool amdgpu_device_asic_has_dc_support(enum amd_asic_type asic_type);
+bool amdgpu_device_asic_has_dc_support(enum amd_asic_type asic_type,
+struct pci_dev *pdev);
bool amdgpu_device_has_dc_support(struct amdgpu_device *adev);
int emu_soc_asic_init(struct amdgpu_device *adev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 1fe850e0a94d..323ed69032a7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2960,11 +2960,12 @@ static void amdgpu_device_detect_sriov_bios(struct amdgpu_device *adev)
* amdgpu_device_asic_has_dc_support - determine if DC supports the asic
*
* @asic_type: AMD asic type
+ * @pdev: pointer to pci_dev instance
*
* Check if there is DC (new modesetting infrastructre) support for an asic.
* returns true if DC has support, false if not.
*/
-bool amdgpu_device_asic_has_dc_support(enum amd_asic_type asic_type)
+bool amdgpu_device_asic_has_dc_support(enum amd_asic_type asic_type,
+struct pci_dev *pdev)
{
switch (asic_type) {
#if defined(CONFIG_DRM_AMD_DC)
@@ -3000,9 +3001,14 @@ bool amdgpu_device_asic_has_dc_support(enum amd_asic_type asic_type)
case CHIP_VEGA20:
#if defined(CONFIG_DRM_AMD_DC_DCN)
case CHIP_RAVEN:
+ return amdgpu_dc != 0;
case CHIP_NAVI10:
case CHIP_NAVI14:
case CHIP_NAVI12:
+ if (nv_is_headless_sku(pdev))
+ return false;
+ else
+ return amdgpu_dc != 0;
case CHIP_RENOIR:
#endif
#if defined(CONFIG_DRM_AMD_DC_DCN3_0)
@@ -3033,7 +3039,7 @@ bool amdgpu_device_has_dc_support(struct amdgpu_device *adev)
if (amdgpu_sriov_vf(adev) || adev->enable_virtual_display)
return false;
- return amdgpu_device_asic_has_dc_support(adev->asic_type);
+ return amdgpu_device_asic_has_dc_support(adev->asic_type, adev->pdev);
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
index 9e92d2a070ac..97014458d7de 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
@@ -516,7 +516,7 @@ uint32_t amdgpu_display_supported_domains(struct amdgpu_device *adev,
*/
if ((bo_flags & AMDGPU_GEM_CREATE_CPU_GTT_USWC) &&
amdgpu_bo_support_uswc(bo_flags) &&
- amdgpu_device_asic_has_dc_support(adev->asic_type)) {
+ amdgpu_device_asic_has_dc_support(adev->asic_type, adev->pdev)) {
switch (adev->asic_type) {
case CHIP_CARRIZO:
case CHIP_STONEY:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 4b78ecfd35f7..b23110241267 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1117,7 +1117,7 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
bool supports_atomic = false;
if (!amdgpu_virtual_display &&
- amdgpu_device_asic_has_dc_support(flags & AMD_ASIC_MASK))
+ amdgpu_device_asic_has_dc_support(flags & AMD_ASIC_MASK, pdev))
supports_atomic = true;
if ((flags & AMD_EXP_HW_SUPPORT) && !amdgpu_exp_hw_support) { diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c index 026e0a8fd526..97446ae75b0b 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -493,7 +493,7 @@ void nv_set_virt_ops(struct amdgpu_device *adev)
adev->virt.ops = &xgpu_nv_virt_ops;
}
-static bool nv_is_headless_sku(struct pci_dev *pdev)
+bool nv_is_headless_sku(struct pci_dev *pdev)
{
if ((pdev->device == 0x731E &&
(pdev->revision == 0xC6 || pdev->revision == 0xC7)) || diff --git a/drivers/gpu/drm/amd/amdgpu/nv.h b/drivers/gpu/drm/amd/amdgpu/nv.h index 515d67bf249f..7880ad0073c9 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.h
+++ b/drivers/gpu/drm/amd/amdgpu/nv.h
@@ -29,6 +29,7 @@
void nv_grbm_select(struct amdgpu_device *adev,
u32 me, u32 pipe, u32 queue, u32 vmid); void nv_set_virt_ops(struct amdgpu_device *adev);
+bool nv_is_headless_sku(struct pci_dev *pdev);
int nv_set_ip_blocks(struct amdgpu_device *adev); int navi10_reg_base_init(struct amdgpu_device *adev); int navi14_reg_base_init(struct amdgpu_device *adev);
--
2.17.1
|