在 2022/11/18 17:18, Michel Dänzer 写道:
On 11/18/22 09:01, Christian König wrote:
Am 18.11.22 um 08:48 schrieb Zhenneng Li:
During reboot test on arm64 platform, it may failure on boot,
so add this mb in smc.
The error message are as follows:
[ 6.996395][ 7] [ T295] [drm:amdgpu_device_ip_late_init [amdgpu]] *ERROR*
late_init of IP block <si_dpm> failed -22
[ 7.006919][ 7] [ T295] amdgpu 0000:04:00.0: amdgpu_device_ip_late_init failed
[ 7.014224][ 7] [ T295] amdgpu 0000:04:00.0: Fatal error during GPU init
Memory barries are not supposed to be sprinkled around like this, you need to give a detailed explanation why this is necessary.
Regards,
Christian.
Signed-off-by: Zhenneng Li <lizhenneng@xxxxxxxxxx>
---
drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c b/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c
index 8f994ffa9cd1..c7656f22278d 100644
--- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c
+++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c
@@ -155,6 +155,8 @@ bool amdgpu_si_is_smc_running(struct amdgpu_device *adev)
u32 rst = RREG32_SMC(SMC_SYSCON_RESET_CNTL);
u32 clk = RREG32_SMC(SMC_SYSCON_CLOCK_CNTL_0);
+ mb();
+
if (!(rst & RST_REG) && !(clk & CK_DISABLE))
return true;
In particular, it makes no sense in this specific place, since it cannot directly affect the values of rst & clk.
I thinks so too.
But when I do reboot test using nine desktop machines, there maybe
report this error on one or two machines after Hundreds of times or
Thousands of times reboot test, at the beginning, I use msleep() instead
of mb(), these two methods are all works, but I don't know what is the
root case.
I use this method on other verdor's oland card, this error message are
reported again.
What could be the root reason?
test environmen:
graphics card: OLAND 0x1002:0x6611 0x1642:0x1869 0x87
driver: amdgpu
os: ubuntu 2004
platform: arm64
kernel: 5.4.18