On 2021-08-12 1:33 p.m., Lazar, Lijo wrote: > On 8/12/2021 1:41 PM, Michel Dänzer wrote: >> On 2021-08-12 7:55 a.m., Koenig, Christian wrote: >>> Hi James, >>> >>> Evan seems to have understood how this all works together. >>> >>> See while any begin/end use critical section is active the work should not be active. >>> >>> When you handle only one ring you can just call cancel in begin use and schedule in end use. But when you have more than one ring you need a lock or counter to prevent concurrent work items to be started. >>> >>> Michelle's idea to use mod_delayed_work is a bad one because it assumes that the delayed work is still running. >> >> It merely assumes that the work may already have been scheduled before. >> >> Admittedly, I missed the cancel_delayed_work_sync calls for patch 2. While I think it can still have some effect when there's a single work item for multiple rings, as described by James, it's probably negligible, since presumably the time intervals between ring_begin_use and ring_end_use are normally much shorter than a second. >> >> So, while patch 2 is at worst a no-op (since mod_delayed_work is the same as schedule_delayed_work if the work hasn't been scheduled yet), I'm fine with dropping it. >> >> >>> Something similar applies to the first patch I think, >> >> There are no cancel work calls in that case, so the commit log is accurate TTBOMK. > > Curious - > > For patch 1, does it make a difference if any delayed work scheduled is cancelled in the else part before proceeding? > > } else if (!enable && adev->gfx.gfx_off_state) { > cancel_delayed_work(); I tried the patch below. While this does seem to fix the problem as well, I see a potential issue: 1. amdgpu_gfx_off_ctrl locks adev->gfx.gfx_off_mutex 2. amdgpu_device_delay_enable_gfx_off runs, blocks in mutex_lock 3. amdgpu_gfx_off_ctrl calls cancel_delayed_work_sync I'm afraid this would deadlock? (CONFIG_PROVE_LOCKING doesn't complain though) Maybe it's possible to fix it with cancel_delayed_work_sync somehow, but I'm not sure how offhand. (With cancel_delayed_work instead, I'm worried amdgpu_device_delay_enable_gfx_off might still enable GFXOFF in the HW immediately after amdgpu_gfx_off_ctrl unlocks the mutex. Then again, that might happen with mod_delayed_work as well...) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c index a0be0772c8b3..3e4585ffb9af 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c @@ -570,8 +570,11 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device *adev, bool enable) if (enable && !adev->gfx.gfx_off_state && !adev->gfx.gfx_off_req_count) { schedule_delayed_work(&adev->gfx.gfx_off_delay_work, GFX_OFF_DELAY_ENABLE); - } else if (!enable && adev->gfx.gfx_off_state) { - if (!amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, false)) { + } else if (!enable) { + cancel_delayed_work_sync(&adev->gfx.gfx_off_delay_work); + + if (adev->gfx.gfx_off_state && + !amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, false)) { adev->gfx.gfx_off_state = false; if (adev->gfx.funcs->init_spm_golden) { -- Earthling Michel Dänzer | https://redhat.com Libre software enthusiast | Mesa and X developer