RE: [RFC v1 0/4] drm: Add support for DRM_CAP_DEFERRED_OUT_FENCE capability

"Kasireddy, Vivek" <vivek.kasireddy@xxxxxxxxx> · Wed, 4 Aug 2021 07:25:37 +0000

Hi Michel,

> >
> >>> The goal:
> >>> - Maintain full framerate even when the Guest scanout FB is flipped onto a hardware
> >> plane
> >>> on the Host -- regardless of either compositor's scheduling policy -- without making
> any
> >>> copies and ensuring that both Host and Guest are not accessing the buffer at the same
> >> time.
> >>>
> >>> The problem:
> >>> - If the Host compositor flips the client's buffer (in this case Guest compositor's
> buffer)
> >>> onto a hardware plane, then it can send a wl_buffer.release event for the previous
> buffer
> >>> only after it gets a pageflip completion. And, if the Guest compositor takes 10-12 ms
> to
> >>> submit a new buffer and given the fact that the Host compositor waits only for 9 ms,
> the
> >>> Guest compositor will miss the Host's repaint cycle resulting in halved frame-rate.
> >>>
> >>> The solution:
> >>> - To ensure full framerate, the Guest compositor has to start it's repaint cycle
> (including
> >>> the 9 ms wait) when the Host compositor sends the frame callback event to its clients.
> >>> In order for this to happen, the dma-fence that the Guest KMS waits on -- before
> sending
> >>> pageflip completion -- cannot be tied to a wl_buffer.release event. This means that,
> the
> >>> Guest compositor has to be forced to use a new buffer for its next repaint cycle when
> it
> >>> gets a pageflip completion.
> >>
> >> Is that really the only solution?
> > [Kasireddy, Vivek] There are a few others I mentioned here:
> > https://gitlab.freedesktop.org/wayland/weston/-/issues/514#note_986572
> > But I think none of them are as compelling as this one.
> >
> >>
> >> If we fix the event timestamps so that both guest and host use the same
> >> timestamp, but then the guest starts 5ms (or something like that) earlier,
> >> then things should work too? I.e.
> >> - host compositor starts at (previous_frametime + 9ms)
> >> - guest compositor starts at (previous_frametime + 4ms)
> >>
> >> Ofc this only works if the frametimes we hand out to both match _exactly_
> >> and are as high-precision as the ones on the host side. Which for many gpu
> >> drivers at least is the case, and all the ones you care about for sure :-)
> >>
> >> But if the frametimes the guest receives are the no_vblank fake ones, then
> >> they'll be all over the place and this carefully tuned low-latency redraw
> >> loop falls apart. Aside fromm the fact that without tuning the guests to
> >> be earlier than the hosts, you're guaranteed to miss every frame (except
> >> when the timing wobbliness in the guest is big enough by chance to make
> >> the deadline on the oddball frame).
> > [Kasireddy, Vivek] The Guest and Host use different event timestamps as we don't
> > share these between the Guest and the Host. It does not seem to be causing any other
> > problems so far but we did try the experiment you mentioned (i.e., adjusting the delays)
> > and it works. However, this patch series is meant to fix the issue without having to tweak
> > anything (delays) because we can't do this for every compositor out there.
> 
> Maybe there could be a mechanism which allows the compositor in the guest to
> automatically adjust its repaint cycle as needed.
> 
> This might even be possible without requiring changes in each compositor, by adjusting
> the vertical blank periods in the guest to be aligned with the host compositor repaint
> cycles. Not sure about that though.
[Kasireddy, Vivek] The problem really is that the Guest compositor -- or any other compositor
for that matter -- assumes that after a pageflip completion, the old buffer submitted in the
previous flip is free and can be reused again. I think this is a guarantee given by KMS. If we have
to enforce this, we (Guest KMS) have to wait until the Host compositor sends a wl_buffer.release --
which can only happen after Host gets a pageflip completion assuming it uses hardware planes .
>From this point onwards, the Guest compositor only has 9 ms (in the case of Weston) -- or less
based on the Host compositor's scheduling policy -- to submit a new frame.

Although, we can adjust the repaint-window of the Guest compositor to ensure a submission 
within 9 ms or increase the delay on the Host, these tweaks are just heuristics. I think in order
to have a generic solution that'll work in all cases means that the Guest compositor has to start
its repaint cycle with a new buffer when the Host sends out the frame callback event.

> 
> Even if not, both this series or making it possible to queue multiple flips require
> corresponding changes in each compositor as well to have any effect.
[Kasireddy, Vivek] Yes, unfortunately; but the hope is that the Guest KMS can do most of
the heavy lifting and keep the changes for the compositors generic enough and minimal.

Thanks,
Vivek
> 
> 
> --
> Earthling Michel Dänzer               |               https://redhat.com
> Libre software enthusiast             |             Mesa and X developer