[AMD Official Use Only - General]
Hi, Luben
Thanks for the review. Please see inline.
Best Regards,
Alan
-----Original Message-----
From: Tuikov, Luben <Luben.Tuikov@xxxxxxx>
Sent: Tuesday, March 28, 2023 3:00 AM
To: Liu, HaoPing (Alan) <HaoPing.Liu@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Lakha, Bhawanpreet <Bhawanpreet.Lakha@xxxxxxx>
Subject: Re: [PATCH] drm/amdgpu: Fix desktop freezed after gpu-reset
Hi,
That's a good fix. Some questions and comments below:
On 2023-03-27 11:20, Alan Liu wrote:
[Why]
After gpu-reset, sometimes the driver would fail to enable vblank irq,
causing flip_done timed out and the desktop freezed.
During gpu-reset, we will disable and enable vblank irq in
dm_suspend() and dm_resume(). Later on in
amdgpu_irq_gpu_reset_resume_helper(), we will check irqs' refcount and decide to enable or disable the irqs again.
However, we have 2 sets of API for controling vblank irq, one is
dm_vblank_get/put() and another is amdgpu_irq_get/put(). Each API has
its own refcount and flag to store the state of vblank irq, and they
are not synchronized.
Is it possible to reconcile controlling VBlank IRQ to a single refcount?
In struct drm_vblank_crtc, we have “enabled” and “refcount” to store vblank irq state, and in struct amdgpu_irq_src we have “enabled_types” as the refcount for each irq in dm layer.
To reconcile vblank irq to a single refcount, my idea is to remove enabled and refcount from struct drm_vblank_crtc, and add a callback function like vblank_irq_enabled() to drm_crtc_funcs.
Drm layer can use this function to check the state or refcount of vblank irq from dm layer. But it may be dangerous because it is a change to drm layer. Do you have any comments?
In drm we use the first API to control vblank irq but in
amdgpu_irq_gpu_reset_resume_helper() we use the second set of API.
The failure happens when vblank irq was enabled by dm_vblank_get()
before gpu-reset, we have vblank->enabled true. However, during
gpu-reset, in amdgpu_irq_gpu_reset_resume_helper(), vblank irq's state
checked from
amdgpu_irq_update() is DISABLED. So finally it will disable vblank irq
again. After gpu-reset, if there is a cursor plane commit, the driver
will try to enable vblank irq by calling drm_vblank_enable(), but the
vblank->enabled is still true, so it fails to turn on vblank irq and
causes flip_done can't be completed in vblank irq handler and desktop
become freezed.
[How]
Combining the 2 vblank control APIs by letting drm's API finally calls
amdgpu_irq's API, so the irq's refcount and state of both APIs can be
synchronized. Also add a check to prevent refcount from being less
then
0 in amdgpu_irq_put().
Signed-off-by: Alan Liu <HaoPing.Liu@xxxxxxx <mailto:HaoPing.Liu@xxxxxxx>>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 3 +++
.../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c | 14
++++++++++----
2 files changed, 13 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
index a6aef488a822..1b66003657e2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
@@ -597,6 +597,9 @@ int amdgpu_irq_put(struct amdgpu_device *adev, struct amdgpu_irq_src *src,
if (!src->enabled_types || !src->funcs->set)
return -EINVAL;
+ if (!amdgpu_irq_enabled(adev, src, type))
+ return 0;
+
if (atomic_dec_and_test(&src->enabled_types[type]))
return amdgpu_irq_update(adev, src, type);
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
index dc4f37240beb..e04f846b0b19 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
@@ -146,7 +146,7 @@ static void vblank_control_worker(struct
work_struct *work)
static inline int dm_set_vblank(struct drm_crtc *crtc, bool enable)
{
- enum dc_irq_source irq_source;
+ int irq_type;
struct amdgpu_crtc *acrtc = to_amdgpu_crtc(crtc);
struct amdgpu_device *adev = drm_to_adev(crtc->dev);
struct dm_crtc_state *acrtc_state = to_dm_crtc_state(crtc->state);
@@ -169,10 +169,16 @@ static inline int dm_set_vblank(struct drm_crtc *crtc, bool enable)
if (rc)
return rc;
- irq_source = IRQ_TYPE_VBLANK + acrtc->otg_inst;
+ irq_type = amdgpu_display_crtc_idx_to_irq_type(adev,
+acrtc->crtc_id);
+
+ if (enable)
+ rc = amdgpu_irq_get(adev, &adev->crtc_irq, irq_type);
+
+ else
There's an unnecessary empty line before the "else". It's a good idea to run patches through scripts/checkpatch.pl.
Thanks, will use the tool next time.
+ rc = amdgpu_irq_put(adev, &adev->crtc_irq, irq_type);
- if (!dc_interrupt_set(adev->dm.dc, irq_source, enable))
- return -EBUSY;
+ if (rc)
+ return rc;
skip:
if (amdgpu_in_reset(adev))
--
Regards,
Luben