Hi Mark, Sean, (and dri-devel) On Wed, Dec 14, 2022 at 07:04:37PM -0800, Brian Norris wrote: > On Tue, Dec 13, 2022 at 04:51:11PM +0000, Mark Brown wrote: > > On Tue, Dec 13, 2022 at 05:56:30AM -0800, KernelCI bot wrote: > > > > The KernelCI bisection bot found regressions in at least two KMS tests > > in the Renesas tree on rk3399-gru-kevin just after the Renesas tree > > merged up mainline: > > > > igt-kms-rockchip.kms_vblank.pipe-A-wait-forked > > igt-kms-rockchip.kms_vblank.pipe-A-query-busy > > > > which it bisected to ca871659ec16 ("drm/bridge: analogix_dp: Support > > PSR-exit to disable transition"). I'm not *100%* sure I trust the > > bisection but it sure is suspicous that two separate bisects for related > > issues landed on the same commit. ... > > | IGT-Version: 1.26-gf8a4a0b (aarch64) (Linux: 6.1.0 aarch64) > > | <14>[ 35.444448] [IGT] drm_read: starting subtest short-buffer-wakeup > > | Starting subtest: short-buffer-wakeup > > | > > | (| drm_read:350) CRITICAL: Test assertion failure function generate_event, file ../tests/drm_read.c:65: > > | (drm_read:350) CRITICAL: <14>[ 36.155642] [IGT] drm_read: exiting, ret=98 > > | Failed assertion: kmstest_get_vblank(fd, pipe, DRM_VBLANK_EVENT) > > | > > | (drm_read:350) CRITICAL: Last errno: 22, Invalid argument > > | Stack trace: > > | > > | #0 ../lib/igt_core.c:1933 __igt_fail_assert() > > | #1 [<unknown>+0xd5362770] > > | #2 [<unknown>+0xd536193c] > > | #3 [__libc_start_main+0xe8] > > | #4 [<unknown>+0xd5361974] > > | #5 [<unknown<6>[ 36.162851] Console: switching to colour frame buffer device 300x100>+0xd5361974] > > | Subtest short-buffer-wakeup failed. ... > I'll look some more, but this might be a difference of test setup, such > that my setup has the issue before and after that commit, but your setup > sees a regression. I believe this is the case; things are broken both before and after, but depending on test sequencing, etc., they might have seemed less broken before. When we're failing, we're getting EINVAL from drm_vblank_get(). That essentially means that vblank has been disabled (drm_crtc_vblank_off()). As it happens, this is exactly what we do when we enter PSR (Panel Self Refresh). Now, these test cases (especially 'kms_vblank.pipe-A-wait-forked') seem to display a static image, and then wait for a bunch of vblank events. This is exactly the sort of case where we should enter PSR, and so we're likely to disable vblank events. Thus, if everything is working right, we will often hit EINVAL in ioctl(DRM_IOCTL_WAIT_VBLANK) ... -> drm_vblank_get(), and fail the test. As to why this appears to be a regression: like mentioned in my previous mail, these tests tend to hose the Analogix PSR state before my patch set. When PSR is hosed, we tend to stay with PSR disabled, and so drm_vblank_get() doesn't return EINVAL, and the test is happy. (Never mind that the display is likely non-functional.) [0] So how to fix this? 1. ignore it; it's "just" a test failure, IIUC [1] 2. change the test, to skip this test if the device has PSR 3. leave vblank enabled even in the presence of PSR 4. otherwise modify the vblank ioctl interface, such that one can "wait" for a vblank that won't come (because the display is in self-refresh / there are no new frames or vblanks) I don't know how to do #2, because this variant of PSR is intentionally opaque to user space. For #3: the downstream PSR support that these systems shipped with initially did not disable vblank in PSR. But that was massively rewritten/refactored by Sean Paul before it made it upstream, and now it does. I tried briefly to factor that part out (drivers/gpu/drm/rockchip/rockchip_drm_vop.c / drm_crtc_vblank_{on,off}()), but because drm_self_refresh_helper.c (ab?)uses the commit step to "disable" the CRTC for PSR (even though the CRTC is not fully disabled), I tend to hit this in drm_atomic_helper_commit_modeset_disables()->disable_outputs(): ret = drm_crtc_vblank_get(crtc); WARN_ONCE(ret != -EINVAL, "driver forgot to call drm_crtc_vblank_off()\n"); And I hit a few more problems too... ...I'm sure I could hack my way through this somehow, but this is all sounding like it could use some advice from someone more familiar with DRM and/or the drm_self_refresh_helper design. I've learned my way around this a bit by necessity, but I still feel like I don't know all the right answers here. (*cough*, Sean?) Brian [0] I definitely reproduce this part locally, before my patches. I can't find non-expired CI logs for "passing" runs to be sure that's what the CI is seeing though. [1] Or is it? I don't really know the DRM_IOCTL_WAIT_VBLANK ABI contract in the presence of PSR, but I believe the upstream PSR support has always worked this way. I could imagine DRM_IOCTL_WAIT_VBLANK users might not love seeing EINVAL here though.