Is there a PATCH 2/2 which I can't find, or is the subject wrong? On 01/21/2016 09:16 AM, Mario Kleiner wrote:
The hardware vblank counter of AMD gpu's resets to zero during a modeset. The new implementation of drm_update_vblank_count() from commit 4dfd6486 "drm: Use vblank timestamps to guesstimate how many vblanks were missed", introduced in Linux 4.4, treats that as a counter wraparound and causes the software vblank counter to jump forward by a large distance of up to 2^24 counts. This interacts badly with 32-bit wraparound handling in drm_handle_vblank_events(), causing that function to no longer deliver pending vblank events to clients. This leads to client hangs especially if clients perform OpenGL or DRI3/Present animations while a modeset happens and triggers the hw vblank counter reset. One prominent example is a hang of KDE Plasma 5's startup progress splash screen during login, making the KDE session unuseable. Another small potential race exists when executing a modeset while vblank interrupts are enabled or just get enabled: The modeset updates radeon_crtc->lb_vblank_lead_lines during radeon_display_bandwidth_update, so if vblank interrupt handling or enable would try to access that variable multiple times at the wrong moment as part of drm_update_vblank_counter, while the scanout happens to be within lb_vblank_lead_lines before the start of vblank, it could cause inconsistent vblank counting and again trigger a jump of the software vblank counter, causing similar client hangs. The most easy way to avoid this small race is to not allow vblank enable or vblank irq's during modeset. This patch replaces calls to drm_vblank_pre/post_modeset in the drivers dpms code with calls to drm_vblank_off/on, as recommended for drivers with hw counters that reset to zero during modeset. Those calls disable vblank interrupts during the modeset sequence and reinitialize vblank counts and timestamps after the modeset properly, taking hw counter reset into account, thereby fixing the problem of forward jumping counters. During a modeset, calls to drm_vblank_get() will no-op/intentionally fail, so no vblank events or pageflips can be queued during modesetting. Radeons static and dynpm power management uses drm_vblank_get to enable vblank irqs to synchronize reclocking to start of vblank. If a modeset would happen in parallel with such a power management action, drm_vblank_get would be suppressed, sync to vblank wouldn't work and a visual glitch could happen. However that glitch would hopefully be hidden by the blanking of the crtc during modeset. A small fix to power management makes sure to check for this and prevent unbalanced vblank reference counts due to mismatched drm_vblank_get/put. Reported-by: Vlastimil Babka <vbabka@xxxxxxx> Signed-off-by: Mario Kleiner <mario.kleiner.de@xxxxxxxxx>
FWIW, this seems to work for the kde5 login issue, thanks. Let me know if you need also some specific testing/debug output, or testing another approach if the "drm_vblank_on/off propaganda" is not acceptable :)
_______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel