Applied. Thanks! Alex On Thu, Jun 25, 2020 at 1:14 PM Ivan Mironov <mironov.ivan@xxxxxxxxx> wrote: > > I updated my system with Radeon VII from kernel 5.6 to kernel 5.7, and > following started to happen on each boot: > > ... > BUG: kernel NULL pointer dereference, address: 0000000000000128 > ... > CPU: 9 PID: 1940 Comm: modprobe Tainted: G E 5.7.2-200.im0.fc32.x86_64 #1 > Hardware name: System manufacturer System Product Name/PRIME X570-P, BIOS 1407 04/02/2020 > RIP: 0010:lock_bus+0x42/0x60 [amdgpu] > ... > Call Trace: > i2c_smbus_xfer+0x3d/0xf0 > i2c_default_probe+0xf3/0x130 > i2c_detect.isra.0+0xfe/0x2b0 > ? kfree+0xa3/0x200 > ? kobject_uevent_env+0x11f/0x6a0 > ? i2c_detect.isra.0+0x2b0/0x2b0 > __process_new_driver+0x1b/0x20 > bus_for_each_dev+0x64/0x90 > ? 0xffffffffc0f34000 > i2c_register_driver+0x73/0xc0 > do_one_initcall+0x46/0x200 > ? _cond_resched+0x16/0x40 > ? kmem_cache_alloc_trace+0x167/0x220 > ? do_init_module+0x23/0x260 > do_init_module+0x5c/0x260 > __do_sys_init_module+0x14f/0x170 > do_syscall_64+0x5b/0xf0 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > ... > > Error appears when some i2c device driver tries to probe for devices > using adapter registered by `smu_v11_0_i2c_eeprom_control_init()`. > Code supporting this adapter requires `adev->psp.ras.ras` to be not > NULL, which is true only when `amdgpu_ras_init()` detects HW support by > calling `amdgpu_ras_check_supported()`. > > Before 9015d60c9ee1, adapter was registered by > > -> amdgpu_device_ip_init() > -> amdgpu_ras_recovery_init() > -> amdgpu_ras_eeprom_init() > -> smu_v11_0_i2c_eeprom_control_init() > > after verifying that `adev->psp.ras.ras` is not NULL in > `amdgpu_ras_recovery_init()`. Currently it is registered > unconditionally by > > -> amdgpu_device_ip_init() > -> pp_sw_init() > -> hwmgr_sw_init() > -> vega20_smu_init() > -> smu_v11_0_i2c_eeprom_control_init() > > Fix simply adds HW support check (ras == NULL => no support) before > calling `smu_v11_0_i2c_eeprom_control_{init,fini}()`. > > Please note that there is a chance that similar fix is also required for > CHIP_ARCTURUS. I do not know whether any actual Arcturus hardware without > RAS exist, and whether calling `smu_i2c_eeprom_init()` makes any sense > when there is no HW support. > > Cc: stable@xxxxxxxxxxxxxxx > Fixes: 9015d60c9ee1 ("drm/amdgpu: Move EEPROM I2C adapter to amdgpu_device") > Signed-off-by: Ivan Mironov <mironov.ivan@xxxxxxxxx> > Tested-by: Bjorn Nostvold <bjorn.nostvold@xxxxxxxxx> > --- > Changelog: > > v1: > - Added "Tested-by" for another user who used this patch to fix the > same issue. > > v0: > - Patch introduced. > --- > drivers/gpu/drm/amd/powerplay/smumgr/vega20_smumgr.c | 11 +++++++---- > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/vega20_smumgr.c b/drivers/gpu/drm/amd/powerplay/smumgr/vega20_smumgr.c > index 2fb97554134f..c2e0fbbccf56 100644 > --- a/drivers/gpu/drm/amd/powerplay/smumgr/vega20_smumgr.c > +++ b/drivers/gpu/drm/amd/powerplay/smumgr/vega20_smumgr.c > @@ -522,9 +522,11 @@ static int vega20_smu_init(struct pp_hwmgr *hwmgr) > priv->smu_tables.entry[TABLE_ACTIVITY_MONITOR_COEFF].version = 0x01; > priv->smu_tables.entry[TABLE_ACTIVITY_MONITOR_COEFF].size = sizeof(DpmActivityMonitorCoeffInt_t); > > - ret = smu_v11_0_i2c_eeprom_control_init(&adev->pm.smu_i2c); > - if (ret) > - goto err4; > + if (adev->psp.ras.ras) { > + ret = smu_v11_0_i2c_eeprom_control_init(&adev->pm.smu_i2c); > + if (ret) > + goto err4; > + } > > return 0; > > @@ -560,7 +562,8 @@ static int vega20_smu_fini(struct pp_hwmgr *hwmgr) > (struct vega20_smumgr *)(hwmgr->smu_backend); > struct amdgpu_device *adev = hwmgr->adev; > > - smu_v11_0_i2c_eeprom_control_fini(&adev->pm.smu_i2c); > + if (adev->psp.ras.ras) > + smu_v11_0_i2c_eeprom_control_fini(&adev->pm.smu_i2c); > > if (priv) { > amdgpu_bo_free_kernel(&priv->smu_tables.entry[TABLE_PPTABLE].handle, > -- > 2.26.2 > > _______________________________________________ > amd-gfx mailing list > amd-gfx@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/amd-gfx