On 2021-08-17 12:37 p.m., Lazar, Lijo wrote: > > > On 8/17/2021 3:29 PM, Michel Dänzer wrote: >> On 2021-08-17 11:37 a.m., Lazar, Lijo wrote: >>> >>> >>> On 8/17/2021 2:56 PM, Michel Dänzer wrote: >>>> On 2021-08-17 11:12 a.m., Lazar, Lijo wrote: >>>>> >>>>> >>>>> On 8/17/2021 1:53 PM, Michel Dänzer wrote: >>>>>> From: Michel Dänzer <mdaenzer@xxxxxxxxxx> >>>>>> >>>>>> schedule_delayed_work does not push back the work if it was already >>>>>> scheduled before, so amdgpu_device_delay_enable_gfx_off ran ~100 ms >>>>>> after the first time GFXOFF was disabled and re-enabled, even if GFXOFF >>>>>> was disabled and re-enabled again during those 100 ms. >>>>>> >>>>>> This resulted in frame drops / stutter with the upcoming mutter 41 >>>>>> release on Navi 14, due to constantly enabling GFXOFF in the HW and >>>>>> disabling it again (for getting the GPU clock counter). >>>>>> >>>>>> To fix this, call cancel_delayed_work_sync when the disable count >>>>>> transitions from 0 to 1, and only schedule the delayed work on the >>>>>> reverse transition, not if the disable count was already 0. This makes >>>>>> sure the delayed work doesn't run at unexpected times, and allows it to >>>>>> be lock-free. >>>>>> >>>>>> v2: >>>>>> * Use cancel_delayed_work_sync & mutex_trylock instead of >>>>>> mod_delayed_work. >>>>>> v3: >>>>>> * Make amdgpu_device_delay_enable_gfx_off lock-free (Christian König) >>>>>> v4: >>>>>> * Fix race condition between amdgpu_gfx_off_ctrl incrementing >>>>>> adev->gfx.gfx_off_req_count and amdgpu_device_delay_enable_gfx_off >>>>>> checking for it to be 0 (Evan Quan) >>>>>> >>>>>> Cc: stable@xxxxxxxxxxxxxxx >>>>>> Reviewed-by: Lijo Lazar <lijo.lazar@xxxxxxx> # v3 >>>>>> Acked-by: Christian König <christian.koenig@xxxxxxx> # v3 >>>>>> Signed-off-by: Michel Dänzer <mdaenzer@xxxxxxxxxx> >>>>>> --- >>>>>> >>>>>> Alex, probably best to wait a bit longer before picking this up. :) >>>>>> >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 +++---- >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 36 +++++++++++++++------- >>>>>> 2 files changed, 30 insertions(+), 17 deletions(-) >>>>>> >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>> index f3fd5ec710b6..f944ed858f3e 100644 >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>> @@ -2777,12 +2777,11 @@ static void amdgpu_device_delay_enable_gfx_off(struct work_struct *work) >>>>>> struct amdgpu_device *adev = >>>>>> container_of(work, struct amdgpu_device, gfx.gfx_off_delay_work.work); >>>>>> - mutex_lock(&adev->gfx.gfx_off_mutex); >>>>>> - if (!adev->gfx.gfx_off_state && !adev->gfx.gfx_off_req_count) { >>>>>> - if (!amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, true)) >>>>>> - adev->gfx.gfx_off_state = true; >>>>>> - } >>>>>> - mutex_unlock(&adev->gfx.gfx_off_mutex); >>>>>> + WARN_ON_ONCE(adev->gfx.gfx_off_state); >>>>>> + WARN_ON_ONCE(adev->gfx.gfx_off_req_count); >>>>>> + >>>>>> + if (!amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, true)) >>>>>> + adev->gfx.gfx_off_state = true; >>>>>> } >>>>>> /** >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c >>>>>> index a0be0772c8b3..b4ced45301be 100644 >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c >>>>>> @@ -563,24 +563,38 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device *adev, bool enable) >>>>>> mutex_lock(&adev->gfx.gfx_off_mutex); >>>>>> - if (!enable) >>>>>> - adev->gfx.gfx_off_req_count++; >>>>>> - else if (adev->gfx.gfx_off_req_count > 0) >>>>>> + if (enable) { >>>>>> + /* If the count is already 0, it means there's an imbalance bug somewhere. >>>>>> + * Note that the bug may be in a different caller than the one which triggers the >>>>>> + * WARN_ON_ONCE. >>>>>> + */ >>>>>> + if (WARN_ON_ONCE(adev->gfx.gfx_off_req_count == 0)) >>>>>> + goto unlock; >>>>>> + >>>>>> adev->gfx.gfx_off_req_count--; >>>>>> - if (enable && !adev->gfx.gfx_off_state && !adev->gfx.gfx_off_req_count) { >>>>>> - schedule_delayed_work(&adev->gfx.gfx_off_delay_work, GFX_OFF_DELAY_ENABLE); >>>>>> - } else if (!enable && adev->gfx.gfx_off_state) { >>>>>> - if (!amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, false)) { >>>>>> - adev->gfx.gfx_off_state = false; >>>>>> + if (adev->gfx.gfx_off_req_count == 0 && !adev->gfx.gfx_off_state) >>>>>> + schedule_delayed_work(&adev->gfx.gfx_off_delay_work, GFX_OFF_DELAY_ENABLE); >>>>>> + } else { >>>>>> + if (adev->gfx.gfx_off_req_count == 0) { >>>>>> + cancel_delayed_work_sync(&adev->gfx.gfx_off_delay_work); >>>>>> + >>>>>> + if (adev->gfx.gfx_off_state && >>>>> >>>>> More of a question which I didn't check last time - Is this expected to be true when the disable call comes in first? >>>> >>>> My assumption is that cancel_delayed_work_sync guarantees amdgpu_device_delay_enable_gfx_off's assignment is visible here. >>>> >>> >>> To clarify - when nothing is scheduled. If enable() is called when the count is 0, it goes to unlock. Now the expectation is someone to call Disable first. >> >> Yes, the very first amdgpu_gfx_off_ctrl call must pass enable=false, or it's a bug, which >> >> if (WARN_ON_ONCE(adev->gfx.gfx_off_req_count == 0)) >> >> will catch. >> >> >>> Let's say Disable() is called first, then the variable will be false, right? >> >> Ohh, I see what you mean. The first time amdgpu_gfx_off_ctrl is called with enable=false, adev->gfx.gfx_off_state == false (what it was initialized to), so it doesn't actually disable GFXOFF in HW. > > Exactly. Turns out that's not the end of that rabbit (side-)hole yet. :) amdgpu_device_init initializes adev->gfx.gfx_off_req_count = 1. amdgpu_gfx_off_ctrl is then called with enable=true from amdgpu_device_init → amdgpu_device_ip_late_init → amdgpu_device_set_pg_state. This schedules amdgpu_device_delay_enable_gfx_off, which runs ~100ms later, enables GFXOFF in the HW and sets adev->gfx.gfx_off_state = true. So it looks fine as is actually, if a bit convoluted. (I wonder if GFXOFF shouldn't rather be enabled synchronously during initialization though) -- Earthling Michel Dänzer | https://redhat.com Libre software enthusiast | Mesa and X developer