Re: [Mesa-dev] Linux Graphics Next: Userspace submission update

Daniel Stone <daniel@xxxxxxxxxxxxx> · Tue, 1 Jun 2021 18:42:21 +0100

Hi,

On Tue, 1 Jun 2021 at 14:18, Michel Dänzer <michel@xxxxxxxxxxx> wrote:
> On 2021-06-01 2:10 p.m., Christian König wrote:
> > Am 01.06.21 um 12:49 schrieb Michel Dänzer:
> >> There isn't a choice for Wayland compositors in general, since there can be arbitrary other state which needs to be applied atomically together with the new buffer. (Though in theory, a compositor might get fancy and special-case surface commits which can be handled by waiting on the GPU)

Yeah, this is pretty crucial.

> >> Latency is largely a matter of scheduling in the compositor. The latency incurred by the compositor shouldn't have to be more than single-digit milliseconds. (I've seen total latency from when the client starts processing a (static) frame to when it starts being scanned out as low as ~6 ms with https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1620, lower than typical with Xorg)
> >
> > Well let me describe it like this:
> >
> > We have an use cases for 144 Hz guaranteed refresh rate. That essentially means that the client application needs to be able to spit out one frame/window content every ~6.9ms. That's tough, but doable.
> >
> > When you now add 6ms latency in the compositor that means the client application has only .9ms left for it's frame which is basically impossible to do.
>
> You misunderstood me. 6 ms is the lowest possible end-to-end latency from client to scanout, but the client can start as early as it wants/needs to. It's a trade-off between latency and the risk of missing a scanout cycle.

Not quite.

When weston-presentation-shm is reporting is a 6ms delta between when
it started its rendering loop and when the frame was presented to
screen. How w-p-s was run matters a lot, because you can insert an
arbitrary delay in there to simulate client rendering. It also matters
a lot that the client is SHM, because that will result in Mutter doing
glTexImage2D on whatever size the window is, then doing a full GL
compositing pass, so even if it's run with zero delay, 6ms isn't 'the
amount of time it takes Mutter to get a frame to screen', it's
measuring the overhead of a texture upload and full-screen composition
as well.

I'm assuming the 'guaranteed 144Hz' target is a fullscreen GL client,
for which you definitely avoid TexImage2D, and could hopefully
(assuming the client picks a modifier which can be displayed) also
avoid the composition pass in favour of direct scanout from the client
buffer; that would give you a much lower number.

Each compositor has its own heuristics around timing. They all make
their own tradeoff between low latency and fewest dropped frames.
Obviously, the higher your latency, the lower the chance of missing a
deadline. There's a lot of detail in the MR that Michel linked.

Cheers,
Daniel