On 2021-08-16 12:20 p.m., Quan, Evan wrote: > [AMD Official Use Only] > > Hi Michel, > > The patch seems reasonable to me(especially the cancel_delayed_work_sync() part). > However, can you explain more about the code below? > What's the race issue here exactly? > > + /* mutex_lock could deadlock with cancel_delayed_work_sync in amdgpu_gfx_off_ctrl. */ > + if (!mutex_trylock(&adev->gfx.gfx_off_mutex)) { > + /* If there's a bug which causes amdgpu_gfx_off_ctrl to be called with enable=true > + * when adev->gfx.gfx_off_req_count is already 0, we might race with that. > + * Re-schedule to make sure gfx off will be re-enabled in the HW eventually. > + */ > + schedule_delayed_work(&adev->gfx.gfx_off_delay_work, AMDGPU_GFX_OFF_DELAY_ENABLE); > + return; > + } If amdgpu_gfx_off_ctrl was called with enable=true when adev->gfx.gfx_off_req_count == 0 already, it could have prevented amdgpu_device_delay_enable_gfx_off from locking the mutex. v3 solves this by only scheduling the work when adev->gfx.gfx_off_req_count transitions from 1 to 0, which means it no longer needs to lock the mutex. -- Earthling Michel Dänzer | https://redhat.com Libre software enthusiast | Mesa and X developer