Hi Michel, > > > >>> The goal: > >>> - Maintain full framerate even when the Guest scanout FB is flipped onto a hardware > >> plane > >>> on the Host -- regardless of either compositor's scheduling policy -- without making > any > >>> copies and ensuring that both Host and Guest are not accessing the buffer at the same > >> time. > >>> > >>> The problem: > >>> - If the Host compositor flips the client's buffer (in this case Guest compositor's > buffer) > >>> onto a hardware plane, then it can send a wl_buffer.release event for the previous > buffer > >>> only after it gets a pageflip completion. And, if the Guest compositor takes 10-12 ms > to > >>> submit a new buffer and given the fact that the Host compositor waits only for 9 ms, > the > >>> Guest compositor will miss the Host's repaint cycle resulting in halved frame-rate. > >>> > >>> The solution: > >>> - To ensure full framerate, the Guest compositor has to start it's repaint cycle > (including > >>> the 9 ms wait) when the Host compositor sends the frame callback event to its clients. > >>> In order for this to happen, the dma-fence that the Guest KMS waits on -- before > sending > >>> pageflip completion -- cannot be tied to a wl_buffer.release event. This means that, > the > >>> Guest compositor has to be forced to use a new buffer for its next repaint cycle when > it > >>> gets a pageflip completion. > >> > >> Is that really the only solution? > > [Kasireddy, Vivek] There are a few others I mentioned here: > > https://gitlab.freedesktop.org/wayland/weston/-/issues/514#note_986572 > > But I think none of them are as compelling as this one. > > > >> > >> If we fix the event timestamps so that both guest and host use the same > >> timestamp, but then the guest starts 5ms (or something like that) earlier, > >> then things should work too? I.e. > >> - host compositor starts at (previous_frametime + 9ms) > >> - guest compositor starts at (previous_frametime + 4ms) > >> > >> Ofc this only works if the frametimes we hand out to both match _exactly_ > >> and are as high-precision as the ones on the host side. Which for many gpu > >> drivers at least is the case, and all the ones you care about for sure :-) > >> > >> But if the frametimes the guest receives are the no_vblank fake ones, then > >> they'll be all over the place and this carefully tuned low-latency redraw > >> loop falls apart. Aside fromm the fact that without tuning the guests to > >> be earlier than the hosts, you're guaranteed to miss every frame (except > >> when the timing wobbliness in the guest is big enough by chance to make > >> the deadline on the oddball frame). > > [Kasireddy, Vivek] The Guest and Host use different event timestamps as we don't > > share these between the Guest and the Host. It does not seem to be causing any other > > problems so far but we did try the experiment you mentioned (i.e., adjusting the delays) > > and it works. However, this patch series is meant to fix the issue without having to tweak > > anything (delays) because we can't do this for every compositor out there. > > Maybe there could be a mechanism which allows the compositor in the guest to > automatically adjust its repaint cycle as needed. > > This might even be possible without requiring changes in each compositor, by adjusting > the vertical blank periods in the guest to be aligned with the host compositor repaint > cycles. Not sure about that though. [Kasireddy, Vivek] The problem really is that the Guest compositor -- or any other compositor for that matter -- assumes that after a pageflip completion, the old buffer submitted in the previous flip is free and can be reused again. I think this is a guarantee given by KMS. If we have to enforce this, we (Guest KMS) have to wait until the Host compositor sends a wl_buffer.release -- which can only happen after Host gets a pageflip completion assuming it uses hardware planes . >From this point onwards, the Guest compositor only has 9 ms (in the case of Weston) -- or less based on the Host compositor's scheduling policy -- to submit a new frame. Although, we can adjust the repaint-window of the Guest compositor to ensure a submission within 9 ms or increase the delay on the Host, these tweaks are just heuristics. I think in order to have a generic solution that'll work in all cases means that the Guest compositor has to start its repaint cycle with a new buffer when the Host sends out the frame callback event. > > Even if not, both this series or making it possible to queue multiple flips require > corresponding changes in each compositor as well to have any effect. [Kasireddy, Vivek] Yes, unfortunately; but the hope is that the Guest KMS can do most of the heavy lifting and keep the changes for the compositors generic enough and minimal. Thanks, Vivek > > > -- > Earthling Michel Dänzer | https://redhat.com > Libre software enthusiast | Mesa and X developer