From: Ivan Mironov <mironov.ivan@xxxxxxxxx> commit 7e89e4aaa9ae83107d059c186955484b3aa6eb23 upstream. I updated my system with Radeon VII from kernel 5.6 to kernel 5.7, and following started to happen on each boot: ... BUG: kernel NULL pointer dereference, address: 0000000000000128 ... CPU: 9 PID: 1940 Comm: modprobe Tainted: G E 5.7.2-200.im0.fc32.x86_64 #1 Hardware name: System manufacturer System Product Name/PRIME X570-P, BIOS 1407 04/02/2020 RIP: 0010:lock_bus+0x42/0x60 [amdgpu] ... Call Trace: i2c_smbus_xfer+0x3d/0xf0 i2c_default_probe+0xf3/0x130 i2c_detect.isra.0+0xfe/0x2b0 ? kfree+0xa3/0x200 ? kobject_uevent_env+0x11f/0x6a0 ? i2c_detect.isra.0+0x2b0/0x2b0 __process_new_driver+0x1b/0x20 bus_for_each_dev+0x64/0x90 ? 0xffffffffc0f34000 i2c_register_driver+0x73/0xc0 do_one_initcall+0x46/0x200 ? _cond_resched+0x16/0x40 ? kmem_cache_alloc_trace+0x167/0x220 ? do_init_module+0x23/0x260 do_init_module+0x5c/0x260 __do_sys_init_module+0x14f/0x170 do_syscall_64+0x5b/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ... Error appears when some i2c device driver tries to probe for devices using adapter registered by `smu_v11_0_i2c_eeprom_control_init()`. Code supporting this adapter requires `adev->psp.ras.ras` to be not NULL, which is true only when `amdgpu_ras_init()` detects HW support by calling `amdgpu_ras_check_supported()`. Before 9015d60c9ee1, adapter was registered by -> amdgpu_device_ip_init() -> amdgpu_ras_recovery_init() -> amdgpu_ras_eeprom_init() -> smu_v11_0_i2c_eeprom_control_init() after verifying that `adev->psp.ras.ras` is not NULL in `amdgpu_ras_recovery_init()`. Currently it is registered unconditionally by -> amdgpu_device_ip_init() -> pp_sw_init() -> hwmgr_sw_init() -> vega20_smu_init() -> smu_v11_0_i2c_eeprom_control_init() Fix simply adds HW support check (ras == NULL => no support) before calling `smu_v11_0_i2c_eeprom_control_{init,fini}()`. Please note that there is a chance that similar fix is also required for CHIP_ARCTURUS. I do not know whether any actual Arcturus hardware without RAS exist, and whether calling `smu_i2c_eeprom_init()` makes any sense when there is no HW support. Cc: stable@xxxxxxxxxxxxxxx Fixes: 9015d60c9ee1 ("drm/amdgpu: Move EEPROM I2C adapter to amdgpu_device") Signed-off-by: Ivan Mironov <mironov.ivan@xxxxxxxxx> Tested-by: Bjorn Nostvold <bjorn.nostvold@xxxxxxxxx> Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> --- drivers/gpu/drm/amd/powerplay/smumgr/vega20_smumgr.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) --- a/drivers/gpu/drm/amd/powerplay/smumgr/vega20_smumgr.c +++ b/drivers/gpu/drm/amd/powerplay/smumgr/vega20_smumgr.c @@ -508,9 +508,11 @@ static int vega20_smu_init(struct pp_hwm priv->smu_tables.entry[TABLE_ACTIVITY_MONITOR_COEFF].version = 0x01; priv->smu_tables.entry[TABLE_ACTIVITY_MONITOR_COEFF].size = sizeof(DpmActivityMonitorCoeffInt_t); - ret = smu_v11_0_i2c_eeprom_control_init(&adev->pm.smu_i2c); - if (ret) - goto err4; + if (adev->psp.ras.ras) { + ret = smu_v11_0_i2c_eeprom_control_init(&adev->pm.smu_i2c); + if (ret) + goto err4; + } return 0; @@ -546,7 +548,8 @@ static int vega20_smu_fini(struct pp_hwm (struct vega20_smumgr *)(hwmgr->smu_backend); struct amdgpu_device *adev = hwmgr->adev; - smu_v11_0_i2c_eeprom_control_fini(&adev->pm.smu_i2c); + if (adev->psp.ras.ras) + smu_v11_0_i2c_eeprom_control_fini(&adev->pm.smu_i2c); if (priv) { amdgpu_bo_free_kernel(&priv->smu_tables.entry[TABLE_PPTABLE].handle,