[AMD Official Use Only] > -----Original Message----- > From: Salvatore Bonaccorso <salvatore.bonaccorso@xxxxxxxxx> On Behalf > Of Salvatore Bonaccorso > Sent: Sunday, February 13, 2022 2:24 AM > To: Deucher, Alexander <Alexander.Deucher@xxxxxxx> > Cc: Dominique Dumont <dod@xxxxxxxxxx>; 1005005@xxxxxxxxxxxxxxx; > Tuikov, Luben <Luben.Tuikov@xxxxxxx>; Quan, Evan > <Evan.Quan@xxxxxxx>; Sasha Levin <sashal@xxxxxxxxxx>; Koenig, Christian > <Christian.Koenig@xxxxxxx>; Pan, Xinhui <Xinhui.Pan@xxxxxxx>; David > Airlie <airlied@xxxxxxxx>; Daniel Vetter <daniel@xxxxxxxx>; amd- > gfx@xxxxxxxxxxxxxxxxxxxxx; dri-devel@xxxxxxxxxxxxxxxxxxxxx; linux- > kernel@xxxxxxxxxxxxxxx > Subject: Regression from 3c196f056666 ("drm/amdgpu: always reset the asic > in suspend (v2)") on suspend? > > Hi Alex, hi all > > In Debian we got a regression report from Dominique Dumont, CC'ed in > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs > .debian.org%2F1005005&data=04%7C01%7Cevan.quan%40amd.com%7 > C735917b6e3f44fc8fda808d9ee54cbc0%7C3dd8961fe4884e608e11a82d994e1 > 83d%7C0%7C0%7C637802870862664095%7CUnknown%7CTWFpbGZsb3d8eyJ > WIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D% > 7C3000&sdata=6xECB3MmvNYuOn41ZOEDPyWUjklY%2Bfxumz7lf8fijwA > %3D&reserved=0 that afer an update to 5.15.15 based kernel, his > machine noe longer suspends correctly, after screen going black as usual it > comes back. The Debian bug above contians a trace. > > Dominique confirmed that this issue persisted after updating to 5.16.7 > furthermore he bisected the issue and found > > 3c196f05666610912645c7c5d9107706003f67c3 is the first bad commit > commit 3c196f05666610912645c7c5d9107706003f67c3 > Author: Alex Deucher <alexander.deucher@xxxxxxx> > Date: Fri Nov 12 11:25:30 2021 -0500 > > drm/amdgpu: always reset the asic in suspend (v2) > > [ Upstream commit daf8de0874ab5b74b38a38726fdd3d07ef98a7ee ] > > If the platform suspend happens to fail and the power rail > is not turned off, the GPU will be in an unknown state on > resume, so reset the asic so that it will be in a known > good state on resume even if the platform suspend failed. > > v2: handle s0ix > > Acked-by: Luben Tuikov <luben.tuikov@xxxxxxx> > Acked-by: Evan Quan <evan.quan@xxxxxxx> > Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx> > Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx> > > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > to be the first bad commit, see > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs > .debian.org%2F1005005%2334&data=04%7C01%7Cevan.quan%40amd.c > om%7C735917b6e3f44fc8fda808d9ee54cbc0%7C3dd8961fe4884e608e11a82d > 994e183d%7C0%7C0%7C637802870862664095%7CUnknown%7CTWFpbGZsb3 > d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0 > %3D%7C3000&sdata=CV%2FKmpYT8WOVJnrTiU91godaFDJMpjih%2FAV > NAcw5qaI%3D&reserved=0 . I checked the back trace posted there(below). It seems the error occurred during amdgpu_device_suspend(). That means Alex's patch should not be related(as it affected only those logic after amdgpu_device_suspend()). So we might got a wrong regression point here. [ 257.842851] ? vi_common_set_clockgating_state+0x229/0x2f0 [amdgpu] [ 257.843356] amdgpu_device_ip_suspend_phase1+0x5e/0xc0 [amdgpu] [ 257.843771] amdgpu_device_suspend+0x62/0xc0 [amdgpu] [ 257.844184] amdgpu_pmops_suspend+0x36/0x70 [amdgpu] [ 257.844631] pci_pm_suspend+0x71/0x160 [ 257.844643] ? pci_pm_freeze+0xb0/0xb0 BR Evan > > Does this ring any bell? Any idea on the problem? > > Regards, > Salvatore