Hi Michel, > On 2021-08-10 10:30 a.m., Daniel Vetter wrote: > > On Tue, Aug 10, 2021 at 08:21:09AM +0000, Kasireddy, Vivek wrote: > >>> On Fri, Aug 06, 2021 at 07:27:13AM +0000, Kasireddy, Vivek wrote: > >>>>>>> > >>>>>>> Hence my gut feeling reaction that first we need to get these two > >>>>>>> compositors aligned in their timings, which propobably needs > >>>>>>> consistent vblank periods/timestamps across them (plus/minux > >>>>>>> guest/host clocksource fun ofc). Without this any of the next steps > >>>>>>> will simply not work because there's too much jitter by the time the > >>>>>>> guest compositor gets the flip completion events. > >>>>>> [Kasireddy, Vivek] Timings are not a problem and do not significantly > >>>>>> affect the repaint cycles from what I have seen so far. > >>>>>> > >>>>>>> > >>>>>>> Once we have solid events I think we should look into statically > >>>>>>> tuning guest/host compositor deadlines (like you've suggested in a > >>>>>>> bunch of places) to consisently make that deadline and hit 60 fps. > >>>>>>> With that we can then look into tuning this automatically and what to > >>>>>>> do when e.g. switching between copying and zero-copy on the host side > >>>>>>> (which might be needed in some cases) and how to handle all that. > >>>>>> [Kasireddy, Vivek] As I confirm here: > >>> https://gitlab.freedesktop.org/wayland/weston/- > >>>>> /issues/514#note_984065 > >>>>>> tweaking the deadlines works (i.e., we get 60 FPS) as we expect. However, > >>>>>> I feel that this zero-copy solution I am trying to create should be independent > >>>>>> of compositors' deadlines, delays or other scheduling parameters. > >>>>> > >>>>> That's not how compositors work nowadays. Your problem is that you don't > >>>>> have the guest/host compositor in sync. zero-copy only changes the timing, > >>>>> so it changes things from "rendering way too many frames" to "rendering > >>>>> way too few frames". > >>>>> > >>>>> We need to fix the timing/sync issue here first, not paper over it with > >>>>> hacks. > >>>> [Kasireddy, Vivek] What I really meant is that the zero-copy solution should be > >>>> independent of the scheduling policies to ensure that it works with all compositors. > >>>> IIUC, Weston for example uses the vblank/pageflip completion timestamp, the > >>>> configurable repaint-window value, refresh-rate, etc to determine when to start > >>>> its next repaint -- if there is any damage: > >>>> timespec_add_nsec(&output->next_repaint, stamp, refresh_nsec); > >>>> timespec_add_msec(&output->next_repaint, &output->next_repaint, -compositor- > >>>> repaint_msec); > >>>> > >>>> And, in the case of VKMS, since there is no real hardware, the timestamp is always: > >>>> now = ktime_get(); > >>>> send_vblank_event(dev, e, seq, now); > >>> > >>> vkms has been fixed since a while to fake high-precision timestamps like > >>> from a real display. > >> [Kasireddy, Vivek] IIUC, that might be one of the reasons why the Guest does not need > >> to have the same timestamp as that of the Host -- to work as expected. > >> > >>> > >>>> When you say that the Guest/Host compositor need to stay in sync, are you > >>>> suggesting that we need to ensure that the vblank timestamp on the Host > >>>> needs to be shared and be the same on the Guest and a vblank/pageflip > >>>> completion for the Guest needs to be sent at exactly the same time it is sent > >>>> on the Host? If yes, I'd say that we do send the pageflip completion to Guest > >>>> around the same time a vblank is generated on the Host but it does not help > >>>> because the Guest compositor would only have 9 ms to submit a new frame > >>>> and if the Host is running Mutter, the Guest would only have 2 ms. > >>>> (https://gitlab.freedesktop.org/wayland/weston/-/issues/514#note_984341) > >>> > >>> Not at the same time, but the same timestamp. And yes there is some fun > >>> there, which is I think the fundamental issue. Or at least some of the > >>> compositor experts seem to think so, and it makes sense to me. > >> [Kasireddy, Vivek] It is definitely possible that if the timestamp is messed up, then > >> the Guest repaint cycle would be affected. However, I do not believe that is the case > >> here given the debug and instrumentation data we collected and scrutinized. Hopefully, > >> compositor experts could chime in to shed some light on this matter. > >> > >>> > >>>>> > >>>>> Only, and I really mean only, when that shows that it's simply impossible > >>>>> to hit 60fps with zero-copy and the guest/host fully aligned should we > >>>>> look into making the overall pipeline deeper. > >>>> [Kasireddy, Vivek] From all the experiments conducted so far and given the > >>>> discussion associated with https://gitlab.freedesktop.org/wayland/weston/- > /issues/514 > >>>> I think we have already established that in order for a zero-copy solution to work > >>>> reliably, the Guest compositor needs to start its repaint cycle when the Host > >>>> compositor sends a frame callback event to its clients. > >>>> > >>>>> > >>>>>>> Only when that all shows that we just can't hit 60fps consistently and > >>>>>>> really need 3 buffers in flight should we look at deeper kms queues. > >>>>>>> And then we really need to implement them properly and not with a > >>>>>>> mismatch between drm_event an out-fence signalling. These quick hacks > >>>>>>> are good for experiments, but there's a pile of other things we need > >>>>>>> to do first. At least that's how I understand the problem here right > >>>>>>> now. > >>>>>> [Kasireddy, Vivek] Experiments done so far indicate that we can hit 59 FPS > >>> consistently > >>>>>> -- in a zero-copy way independent of compositors' delays/deadlines -- with this > >>>>>> patch series + the Weston MR I linked in the cover letter. The main reason why > this > >>>>>> works is because we relax the assumption that when the Guest compositor gets a > >>>>>> pageflip completion event that it could reuse the old FB it submitted in the > previous > >>>>>> atomic flip and instead force it to use a new one. And, we send the pageflip > >>> completion > >>>>>> event to the Guest when the Host compositor sends a frame callback event. > Lastly, > >>>>>> we use the (deferred) out_fence as just a mechanism to tell the Guest compositor > >>> when > >>>>>> it can release references on old FBs so that they can be reused again. > >>>>>> > >>>>>> With that being said, the only question is how can we accomplish the above in an > >>>>> upstream > >>>>>> acceptable way without regressing anything particularly on bare-metal. Its not > clear > >>> if > >>>>> just > >>>>>> increasing the queue depth would work or not but I think the Guest compositor > has to > >>> be > >>>>> told > >>>>>> when it can start its repaint cycle and when it can assume the old FB is no longer > in > >>> use. > >>>>>> On bare-metal -- and also with VKMS as of today -- a pageflip completion > indicates > >>>>> both. > >>>>>> In other words, Vblank event is the same as Flip done, which makes sense on > bare- > >>> metal. > >>>>>> But if we were to have two events at-least for VKMS: vblank to indicate to Guest > to > >>> start > >>>>>> repaint and flip_done to indicate to drop references on old FBs, I think this > problem > >>> can > >>>>>> be solved even without increasing the queue depth. Can this be acceptable? > >>>>> > >>>>> That's just another flavour of your "increase queue depth without > >>>>> increasing the atomic queue depth" approach. I still think the underlying > >>>>> fundamental issue is a timing confusion, and the fact that adjusting the > >>>>> timings fixes things too kinda proves that. So we need to fix that in a > >>>>> clean way, not by shuffling things around semi-randomly until the specific > >>>>> config we tests works. > >>>> [Kasireddy, Vivek] This issue is not due to a timing or timestamp mismatch. We > >>>> have carefully instrumented both the Host and Guest compositors and measured > >>>> the latencies at each step. The relevant debug data only points to the scheduling > >>>> policy -- of both Host and Guest compositors -- playing a role in Guest rendering > >>>> at 30 FPS. > >>> > >>> Hm but that essentially means that the events your passing around have an > >>> even more ad-hoc implementation specific meaning: Essentially it's the > >>> kick-off for the guest's repaint loop? That sounds even worse for a kms > >>> uapi extension. > >> [Kasireddy, Vivek] The pageflip completion event/vblank event indeed serves as the > >> kick-off for a compositor's (both Guest and Host) repaint loop. AFAICT, Weston > >> works that way and even if we increase the queue depth to solve this problem, I don't > >> think it'll help because the arrival of this event always indicates to a compositor to > >> start its repaint cycle again and assume that the previous buffers are all free. > > > > I thought this is how simple compositors work, and weston has since a > > while it's own timer, which is based on the timestamp it gets (at on > > drivers with vblank support), so that it starts the repaint loop a few ms > > before the next vblank. And not immediately when it receives the old page > > flip completion event. > > As long as it's a fixed timer, there's still a risk that the guest compositor repaint cycle runs > too late for the host one (unless the guest cycle happens to be scheduled significantly > earlier than the host one). > > Note that current mutter Git main (to become the 41 release this autumn) uses dynamic > scheduling of its repaint cycle based on how long the last 16 frames took to draw and > present. In theory, this could automatically schedule the guest cycle early enough for the > host one. [Kasireddy, Vivek] I'd like to try it out soon; it'd be very interesting to see how Mutter works in both Guest and Host with this new scheduling policy. Having said that, I think there is still a need to come up with a comprehensive solution that is independent of compositors' scheduling policies. To that end, I am thinking of splitting the pageflip completion event into two events: vblank event (to indicate to compositor to start repaint) and flip_done event (to indicate to release references on old FBs). Or, introduce two new signals/fences along similar lines. Thoughts? Thanks, Vivek > > > -- > Earthling Michel Dänzer | https://redhat.com > Libre software enthusiast | Mesa and X developer