Patch "drm/amdgpu/vkms: relax timer deactivation by hrtimer_try_to_cancel" has been added to the 5.15-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    drm/amdgpu/vkms: relax timer deactivation by hrtimer_try_to_cancel

to the 5.15-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     drm-amdgpu-vkms-relax-timer-deactivation-by-hrtimer_.patch
and it can be found in the queue-5.15 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 2291d9f1bcb49b557c2877157eaa2d6bc91da1d3
Author: Guchun Chen <guchun.chen@xxxxxxx>
Date:   Thu Jul 6 15:57:21 2023 +0800

    drm/amdgpu/vkms: relax timer deactivation by hrtimer_try_to_cancel
    
    [ Upstream commit b42ae87a7b3878afaf4c3852ca66c025a5b996e0 ]
    
    In below thousands of screen rotation loop tests with virtual display
    enabled, a CPU hard lockup issue may happen, leading system to unresponsive
    and crash.
    
    do {
            xrandr --output Virtual --rotate inverted
            xrandr --output Virtual --rotate right
            xrandr --output Virtual --rotate left
            xrandr --output Virtual --rotate normal
    } while (1);
    
    NMI watchdog: Watchdog detected hard LOCKUP on cpu 1
    
    ? hrtimer_run_softirq+0x140/0x140
    ? store_vblank+0xe0/0xe0 [drm]
    hrtimer_cancel+0x15/0x30
    amdgpu_vkms_disable_vblank+0x15/0x30 [amdgpu]
    drm_vblank_disable_and_save+0x185/0x1f0 [drm]
    drm_crtc_vblank_off+0x159/0x4c0 [drm]
    ? record_print_text.cold+0x11/0x11
    ? wait_for_completion_timeout+0x232/0x280
    ? drm_crtc_wait_one_vblank+0x40/0x40 [drm]
    ? bit_wait_io_timeout+0xe0/0xe0
    ? wait_for_completion_interruptible+0x1d7/0x320
    ? mutex_unlock+0x81/0xd0
    amdgpu_vkms_crtc_atomic_disable
    
    It's caused by a stuck in lock dependency in such scenario on different
    CPUs.
    
    CPU1                                             CPU2
    drm_crtc_vblank_off                              hrtimer_interrupt
        grab event_lock (irq disabled)                   __hrtimer_run_queues
            grab vbl_lock/vblank_time_block                  amdgpu_vkms_vblank_simulate
                amdgpu_vkms_disable_vblank                       drm_handle_vblank
                    hrtimer_cancel                                         grab dev->event_lock
    
    So CPU1 stucks in hrtimer_cancel as timer callback is running endless on
    current clock base, as that timer queue on CPU2 has no chance to finish it
    because of failing to hold the lock. So NMI watchdog will throw the errors
    after its threshold, and all later CPUs are impacted/blocked.
    
    So use hrtimer_try_to_cancel to fix this, as disable_vblank callback
    does not need to wait the handler to finish. And also it's not necessary
    to check the return value of hrtimer_try_to_cancel, because even if it's
    -1 which means current timer callback is running, it will be reprogrammed
    in hrtimer_start with calling enable_vblank to make it works.
    
    v2: only re-arm timer when vblank is enabled (Christian) and add a Fixes
    tag as well
    
    v3: drop warn printing (Christian)
    
    v4: drop superfluous check of blank->enabled in timer function, as it's
    guaranteed in drm_handle_vblank (Christian)
    
    Fixes: 84ec374bd580 ("drm/amdgpu: create amdgpu_vkms (v4)")
    Cc: stable@xxxxxxxxxxxxxxx
    Suggested-by: Christian König <christian.koenig@xxxxxxx>
    Signed-off-by: Guchun Chen <guchun.chen@xxxxxxx>
    Reviewed-by: Christian König <christian.koenig@xxxxxxx>
    Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
index 24251cdf95073..4e8274de8fc0c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
@@ -54,8 +54,9 @@ static enum hrtimer_restart amdgpu_vkms_vblank_simulate(struct hrtimer *timer)
 	WARN_ON(ret_overrun != 1);
 
 	ret = drm_crtc_handle_vblank(crtc);
+	/* Don't queue timer again when vblank is disabled. */
 	if (!ret)
-		DRM_ERROR("amdgpu_vkms failure on handling vblank");
+		return HRTIMER_NORESTART;
 
 	return HRTIMER_RESTART;
 }
@@ -80,7 +81,7 @@ static void amdgpu_vkms_disable_vblank(struct drm_crtc *crtc)
 {
 	struct amdgpu_crtc *amdgpu_crtc = to_amdgpu_crtc(crtc);
 
-	hrtimer_cancel(&amdgpu_crtc->vblank_timer);
+	hrtimer_try_to_cancel(&amdgpu_crtc->vblank_timer);
 }
 
 static bool amdgpu_vkms_get_vblank_timestamp(struct drm_crtc *crtc,



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux