On Fri, Jan 06, 2023 at 07:17:53PM +0100, Daniel Vetter wrote: > Ok I think I was a bit slow here, and it makes sense. Except this now > means we loose this check, and I'm also not sure whether we really want > drivers to implement this all. > > What I think we want here is a bit more: > - for the self-refresh case check that the vblank all still works You mean, keep the WARN_ONCE(), but invert it to ensure that 'ret == 0'? I did consider that, but I don't know why I stopped. > - check that drivers which use self_refresh are not using > drm_atomic_helper_wait_for_vblanks(), because that would defeat the > point I'm a bit lost on this one. drm_atomic_helper_wait_for_vblanks() is part of the common drm_atomic_helper_commit_tail*() helpers, and so it's naturally used in many cases (including Rockchip/PSR). And how does it defeat the point? > - have a drm_crtc_vblank_off/on which take the crtc state, so they can > look at the self-refresh state And I suppose you mean this helper variant would kick off the next step (fake vblank timer)? > - fake vblanks with hrtimer, because on most hw when you turn off the crtc > the vblanks are also turned off, and so your compositor would still > hang. The vblank machinery already has all the code to make this happen > (and if it's not all, then i915 psr code should have it). Is a timer better than an interrupt? I'm pretty sure the vblank interrupts still can fire on Rockchip CRTC (VOP) (see also the other branch of this thread), so this isn't really necessary. (IGT vblank tests pass without hanging.) Unless you simply prefer a fake timer for some reason. Also, I still haven't found that fake timer machinery, but maybe I just don't know what I'm looking for. > - I think kunit tests for this all would be really good, it's a rather > complex state machinery between modesets and vblank functionality. You > can speed up the kunit tests with some really high refresh rate, which > isn't possible on real hw. Last time I tried my hand at kunit in a subsystem with no prior kunit tests, I had a miserable time and gave up. At least DRM has a few already, so maybe this wouldn't be as terrible. Perhaps I can give this a shot, but there's a chance this will kick things to the back burner far enough that I simply don't get around to it at all. (So far, I'm only addressing this because KernelCI complained.) > I'm also wondering why we've had this code for years and only hit issues > now? I'd guess a few reasons: 1. drm_self_refresh_helper_init() is only used by one driver -- Rockchip 2. Rockchip systems are most commonly either Chromebooks, or else otherwise cheap embedded things, and may not have displays at all, let alone displays with PSR 3. Rockchip Chromebooks shipped with a kernel forked off of the earlier PSR support, before everything got refactored (and vblank handling regressed) for the self-refresh "helpers". They only upgraded to a newer upstream kernel within the last few months. 4. AFAICT, ChromeOS user space doesn't even exercise the vblank-related ioctls, so we don't actually notice that this is "broken". I suppose it would only be IGT tests that notice. 5. I fixed up various upstream PSR bugs are part of #3 [0], along the way I unborked PSR enough that KernelCI finally caught the bug. See my explanation in [1] for why the vblank bug was masked, and appeared to be a "regression" due to my more recent fixes. Brian [0] Combined with point #2: ChromeOS would be the first serious users of the refactored PSR support. All this was needed to make it actually usable: (2021) c4c6ef229593 drm/bridge: analogix_dp: Make PSR-exit block less (2022) ca871659ec16 drm/bridge: analogix_dp: Support PSR-exit to disable transition <--- KernelCI "blamed" this one, because PSR was less broken (2022) e54a4424925a drm/atomic: Force bridge self-refresh-exit on CRTC switch [1] https://lore.kernel.org/dri-devel/Y6OCg9BPnJvimQLT@xxxxxxxxxx/ Re: renesas/master bisection: igt-kms-rockchip.kms_vblank.pipe-A-wait-forked on rk3399-gru-kevin