RE: [PATCH v2 04/10] drm/amdgpu/kfd: remove is_hws_hang and is_resetting

"Li, Yunxiang (Teddy)" <Yunxiang.Li@xxxxxxx> · Thu, 30 May 2024 00:06:25 +0000

[AMD Official Use Only - AMD Internal Distribution Only]

> One thing I could see going wrong is, that down_read_trylock(&dqm->dev-
> >adev->reset_domain->sem) will not fail immediately when the reset is
> scheduled. So there may be multipe attempts at HW access that detect an
> error or time out, which may get the HW into a worse state or delay the actual
> reset.

I suppose we can always check amdgpu_in_reset first before we do down_read_trylock, this would prevent new readers from coming in while the reset thread is waiting on current readers to finish. With a the rwsem alone I suppose there's a chance that the writer would be starved?

Teddy