On Thu, Feb 1, 2018 at 9:20 PM, Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote: > Quoting Andy Lutomirski (2018-02-01 21:04:30) >> I got this after a recent suspend/resume: >> >> Feb 01 09:44:34 laptop systemd-logind[2412]: Lid closed. >> Feb 01 09:44:34 laptop systemd-logind[2412]: device-enumerator: scan all dirs >> Feb 01 09:44:34 laptop systemd-logind[2412]: device-enumerator: >> scanning /sys/bus >> Feb 01 09:44:34 laptop systemd-logind[2412]: device-enumerator: >> scanning /sys/class >> Feb 01 09:44:34 laptop systemd-logind[2412]: Failed to open >> configuration file '/etc/systemd/sleep.conf': No such file or >> directory >> Feb 01 09:44:34 laptop systemd-logind[2412]: Suspending... >> Feb 01 09:44:34 laptop systemd-logind[2412]: Sent message type=signal >> sender=n/a destination=n/a object=/org/freedesktop/login1 >> interface=org.freedesktop.login1.Manager member=PrepareForSleep >> cookie=570 reply >> Feb 01 09:44:34 laptop systemd-logind[2412]: Got message >> type=method_call sender=:1.46 destination=:1.1 >> object=/org/freedesktop/login1/session/_32 >> interface=org.freedesktop.login1.Session member=ReleaseDevice >> Feb 01 09:44:34 laptop systemd-logind[2412]: Sent message type=signal >> sender=n/a destination=:1.46 >> object=/org/freedesktop/login1/session/_32 >> interface=org.freedesktop.login1.Session member=PauseDevice cookie >> Feb 01 09:44:34 laptop gnome-shell[2630]: Failed to apply DRM plane >> transform 0: Permission denied >> Feb 01 09:44:34 laptop gnome-shell[2630]: drmModeSetCursor2 failed >> with (Permission denied), drawing cursor with OpenGL from now on >> >> But I don't see the word "cursor" in my system logs before the first >> suspend. What am I looking for? This is Fedora 27 running a Gnome >> Wayland session, but it hasn't been reinstalled in some time, so it's >> possible that there are some weird settings sitting around. But I did >> check and I have no weird i915 parameters. > > You are using gnome-shell as the display server. From that it appears to > have started off with a HW cursor and switched to a SW cursor after > suspend. Did you notice a change in behaviour? After rebooting or just > restarting gnome-shell? I think it's less consistently bad after a reboot before suspending. > >> Also, are these things potentially related: >> >> [ 3067.702527] [drm:intel_pipe_update_start [i915]] *ERROR* Potential >> atomic update failure on pipe A > > They are just "missed the immediate vblank for the screen update" > messages. Should not be related to PSR, but may cause jitter by delaying > the odd screen update. I just got this one, and the timestamp is at least reasonably close to a giant latency spike: [ 288.799654] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=31 end=32) time 15 us, min 1073, max 1079, scanline start 1087, end 1088 > >> As I'm typing this, I've seen a couple instances of what seems like a >> full *second* of cursor latency, but I've only gotten the potential >> atomic update failure once. >> >> And is there any straightforward tracing to do to distinguish between >> PSR exit latency and other potential sources of latency? > > It looks plausible that we could at least report how long it takes the > registers to reflect the change in state (but we don't). The best source > of information atm is /sys/kernel/debug/dri/0/i915_edp_psr_status. Hmm. I went and looked at the code, and I noticed what could be bugs or could (more likely) be my confusion since I don't know this code at all: intel_single_frame_update() does something inscrutable to me, but I imagine it does something that causes the next page flip to get noticed by the panel even with PSR on. But how does the code that calls it know that anything happened? (Looking at the commit history, maybe this is something special that's only needed on some platforms but doesn't replace the normal PSR exit sequence.) Perhaps more interestingly, intel_psr_flush() does this: /* By definition flush = invalidate + flush */ if (frontbuffer_bits) intel_psr_exit(dev_priv); if (!dev_priv->psr.active && !dev_priv->psr.busy_frontbuffer_bits) if (!work_busy(&dev_priv->psr.work.work)) schedule_delayed_work(&dev_priv->psr.work, msecs_to_jiffies(100)); I'm guessing that the idea is that we're turning off PSR because we want the panel to update and we expect that, in 100ms, the update will have hit the panel and we'll have been idle long enough for it to make sense to re-enter PSR. IOW, the code wants PSR to be off for at least 100ms and then to turn back on. But this code actually says "turn PSR back on in at *most* 100ms". What happens if there are two screen updates 99ms apart? The first one should work fine, but the next one will hit with 1ms left on the delayed work, and intel_psr_work() will get called in 1ms. There's some magic with busy_frontbuffer_bits, but it seems questionable to me that intel_psr_flush() clears busy_frontbuffer_bits and *then* calls intel_psr_exit(). Naively, I would expect that PSR needs to be kept off until the vblank following the page flip. Also, in intel_psr_work(), shouldn't this code: /* * The delayed work can race with an invalidate hence we need to * recheck. Since psr_flush first clears this and then reschedules we * won't ever miss a flush when bailing out here. */ if (dev_priv->psr.busy_frontbuffer_bits) goto unlock; re-arm the delayed work? Anyway, this is all on a 4.14 kernel. I should update to 4.16 and see what happens. _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel