On second though this will break what about reserving bad pages when resetting GPU for non RAS error reason such as manual reset ,S3 or ring timeout, (amdgpu_ras_resume->amdgpu_ras_reset_gpu) so i will keep the code as is. Another possible issue in existing code - looks like no reservation will take place in those case even now as amdgpu_ras_reserve_bad_pages data->last_reserved will be equal to data->count , no ? Looks like for this case you need to add flag to FORCE reservation for all pages from 0 to data->counnt. Andrey On 9/11/19 10:19 AM, Andrey Grodzovsky wrote: > I like this much more, I will relocate to > amdgpu_umc_process_ras_data_cb an push. > > Andrey > > On 9/10/19 11:08 PM, Zhou1, Tao wrote: >> amdgpu_ras_reserve_bad_pages is only used by umc block, so another >> approach is to move it into amdgpu_umc_process_ras_data_cb. >> Anyway, either way is OK and the patch is: >> >> Reviewed-by: Tao Zhou <tao.zhou1@xxxxxxx> >> >>> -----Original Message----- >>> From: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx> >>> Sent: 2019年9月11日 3:41 >>> To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx >>> Cc: Chen, Guchun <Guchun.Chen@xxxxxxx>; Zhou1, Tao >>> <Tao.Zhou1@xxxxxxx>; Deucher, Alexander >>> <Alexander.Deucher@xxxxxxx>; Grodzovsky, Andrey >>> <Andrey.Grodzovsky@xxxxxxx> >>> Subject: [PATCH] drm/amdgpu: Fix mutex lock from atomic context. >>> >>> Problem: >>> amdgpu_ras_reserve_bad_pages was moved to amdgpu_ras_reset_gpu >>> because writing to EEPROM during ASIC reset was unstable. >>> But for ERREVENT_ATHUB_INTERRUPT amdgpu_ras_reset_gpu is called >>> directly from ISR context and so locking is not allowed. Also it's >>> irrelevant for >>> this partilcular interrupt as this is generic RAS interrupt and not >>> memory >>> errors specific. >>> >>> Fix: >>> Avoid calling amdgpu_ras_reserve_bad_pages if not in task context. >>> >>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx> >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 4 +++- >>> 1 file changed, 3 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h >>> index 012034d..dd5da3c 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h >>> @@ -504,7 +504,9 @@ static inline int amdgpu_ras_reset_gpu(struct >>> amdgpu_device *adev, >>> /* save bad page to eeprom before gpu reset, >>> * i2c may be unstable in gpu reset >>> */ >>> - amdgpu_ras_reserve_bad_pages(adev); >>> + if (in_task()) >>> + amdgpu_ras_reserve_bad_pages(adev); >>> + >>> if (atomic_cmpxchg(&ras->in_recovery, 0, 1) == 0) >>> schedule_work(&ras->recovery_work); >>> return 0; >>> -- >>> 2.7.4 _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx