Il giorno lun 27 mag 2024 alle ore 16:19 Victor Toso <victortoso@xxxxxxxxxx> ha scritto: > > Hi, > > On Tue, Apr 16, 2024 at 12:59:50PM GMT, Michael Scherle wrote: > > Hello, > > > > Thanks for your changesets, they definitely reduce the delay significantly > > (to a similar level as our provosoric fixes, but yours are much cleaner). > > > > On the client side (spice-gtk) I looked at the problem with the high > > decoding time (2 frames buffering) and was able to find a simple fix with > > the help of the gstreamer community: > > > > --- > > src/channel-display-priv.h | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/src/channel-display-priv.h b/src/channel-display-priv.h > > index 1a7590a..a2af1a7 100644 > > --- a/src/channel-display-priv.h > > +++ b/src/channel-display-priv.h > > @@ -177,7 +177,7 @@ static const struct { > > * (hardcoded in spice-server), let's add it here to avoid the warning. > > */ > > { SPICE_DISPLAY_CAP_CODEC_H264, "h264", > > - "h264parse ! avdec_h264", "video/x-h264,stream-format=byte-stream" }, > > + "h264parse ! avdec_h264", > > "video/x-h264,stream-format=byte-stream,alignment=au" }, > > > > /* SPICE_VIDEO_CODEC_TYPE_VP9 */ > > { SPICE_DISPLAY_CAP_CODEC_VP9, "vp9", > > @@ -185,7 +185,7 @@ static const struct { > > > > /* SPICE_DISPLAY_CAP_CODEC_H265 */ > > { SPICE_DISPLAY_CAP_CODEC_H265, "h265", > > - "h265parse ! avdec_h265", "video/x-h265,stream-format=byte-stream" }, > > + "h265parse ! avdec_h265", > > "video/x-h265,stream-format=byte-stream,alignment=au" }, > > jfyi, this was discussed in the past. It depends how spice server > was configured too, no? I'm not sure, it has been awhile. What I > mean is, what/who is doing h264 encoding. We had a > spice-streaming-agent that wrapped guest's GPU h264 encoding and > sent to the client, with the same protocol.... depending how it > is configured, the stream-format was important I think. Again, > not 100% sure. > I think we used the same format. I also remember that we sent an additional NAL unit to force the "flush" so I think it's very similar, the stream render waits the next SPICE packet as not recognizing the frame to have ended. > > > > }; > > > > -- > > 2.40.1 > > > > However, this change should probably still be tested on different setups. > > Since I don't know whether they are always au aligned, I should probably > > find out about that. > > > > Also I have made other experiments, such as removing decoding_queue in > > channel-display-gst.c and adding the SpiceGstFrame to the metadata of the > > gstBuffer instead, as well as completely ignoring the display time of a > > frame and instead displaying them immediately. With that i got down to > > 60-80ms delay. > > If you send patches about this one, feel free to tag me. This > looks cool. > > > Do you know if your changes or similar ones that reduce the > > delay will go upstream soon? > > > > While looking through the source code, I found > > SPICE_KEYPRESS_DELAY, which is not mentioned anywhere. Is this > > the only use to save some network traffic? Is there any reason > > not to always set this to 0 in today's network environments? > > (And maybe set the default to 0?) > > Introduced in c03e002152dc0c, commit log says: > > > widget: add keypress-delay property > > > > The delay before the press event is sent to the server if the key is > > kept pressed. If the key is released within that time, that delay is > > ignored and a single key-press-release event will be sent. > > Introduced in 2012. I'm pretty sure there were reasons for it. > Not sure if worth to remove it. > Not much indication on why it was introduced. Beside reducing the network packets (but not much the traffic, display traffic is way bigger) I would suppose wonky networks. Suppose the network has quite some weird latency patterns and you type (so press and release) "A" key. You send a press request and a release request. But the server receives the release after a while (say 1 second or more for instance). This could trigger key repetition in the guest causing a "AAA" (for instance) to be typed. Typing normally 100ms is enough to release the key so even on wonky networks you won't hit key repetitions due to network delays. But that's a theory. Surely if you want to play a game this delay is not helping :-) > Cheers, > Victor > > > Michael > > Frediano > > On 03.04.24 21:22, Frediano Ziglio wrote: > > > Frediano > > > > > > Il giorno mar 2 apr 2024 alle ore 15:27 Michael Scherle > > > <michael.scherle@xxxxxxxxxxxxxxxxxx> ha scritto: > > > > > > > > Hi Frediano, > > > > > > > > thank you very much for your detailed answer. > > > > > > > > > > > > On 02.04.24 14:13, Frediano Ziglio wrote: > > > > > > > > > Really short explanation: Lipsync. > > > > > > > > > > Less cryptic explanation: video streaming was added much time ago when > > > > > desktops used 2D graphic drawings, like lines, fillings, strings and > > > > > so on. At that time networks were more unreliable, latency bigger, and > > > > > with high probability a continuous bitblt on the same big area was a > > > > > video playing. So the idea of detecting the video playing and > > > > > optimizing to sync audio and video was a good idea. > > > > > > > > ok this explains a lot. > > > > > > > > > Now starts my opinionated ideas. The idea of continuous bitblt being > > > > > only a video stream is wrong, nowadays desktops do use large bitblt > > > > > for everything, or better they use 3D cards a lot and compose the > > > > > various windows on the screen which appears to us as just bitblt, > > > > > often contiguous. So the delay should just be removed optimizing for > > > > > real time video streaming. As you realize the algorithm also keeps > > > > > increasing the delay for every glitch found which is not improving the > > > > > user experience. I have different changesets removing entirely all > > > > > these delays (it's possible to get this just by changing the server > > > > > part), the result is much less delay, the audio/video sync (watching a > > > > > movie) is, with nowadays networks, acceptable. > > > > > > > > > > > > Would it be possible to get your changesets, so that I could try them > > > > out? I would be interested to know how this can be implemented with only > > > > server-side changes. A dirty idea I had (and tried) would be to set the > > > > mm_time to the past so that the client displays the image immediately, > > > > but that would not be a good fix in my opinion. > > > > > > > > > > That's the commit > > > https://cgit.freedesktop.org/~fziglio/spice-server/commit/?h=nvidia&id=eaaec7be80a9d402f425f7571bb27a082ebf739a. > > > > > > > I would rather consider it reasonable that the server timestamps the > > > > frames (and perhaps the sound) with the encoding time and that the > > > > client itself calculates when it wants to display them (from the diffs). > > > > So the client could decide if it wants to display the images directly or > > > > add some delay to compensate for network jitter (or lipsync) or maybe > > > > even implement something like v-sync. These would of course be breaking > > > > changes that would require changes to the client and server and would > > > > make them incompatible with older versions. If this could not be done > > > > directly, due to compatibility reasons, maybe this could be implemented > > > > in a separate low latency mode or something like that (which both server > > > > and client needs to support). > > > > > > > > > > I suppose the negative time you though is something like > > > https://cgit.freedesktop.org/~fziglio/spice-server/commit/?h=nvidia&id=4a1a2a20505bc453f30573a0d453a9dfa1d97e7c > > > (which improve the previous). > > > > > > > Even with above ideas applied, for spice-gtk, I have noticed a high > > > > decode delay. The gstreamer pipeline always seems to keep at least 2 > > > > frames in the pipeline (regardless of the frame rate) which increases > > > > the delay further. Have you also noticed this? I'm currently looking > > > > into the reason for this. > > > > > > > > When testing stuff out we saw that Sunshine/Moonlight performed very > > > > well in generating a low delay and high QoE. That is kind of our > > > > benchmark for remote access to strive for :) > > > > > > > > Greetings > > > > Michael > > > > > > > > > > Frediano > > > > > > > > > > > > > > > On 15.03.24 14:08, Michael Scherle wrote: > > > > > > > Hello spice developers, > > > > > > > > > > > > > > we are trying to develop an Open Source virtual desktop infrastructure > > > > > > > to be deployed at multiple German universities as described, by my > > > > > > > colleagues, in the paper which I have put in the attachment. The > > > > > > > solution based on openstack, qemu, spice... Our plan is also to have VM > > > > > > > instances with virtual GPUs (SR-IOV). Due to the resulting requirements, > > > > > > > it is necessary to transmit the image data as a video stream. > > > > > > > We have seen Vivek Kasireddy recent work on spice which solves exactly > > > > > > > this problem. However, when we tested it, we noticed a very high input > > > > > > > to display delay (400 ms+ but only if the image data is transferred as > > > > > > > video-stream). However, the problem seems to be a more general spice > > > > > > > problem or is there something wrong with our setup or are there special > > > > > > > parameters that we are missing? > > > > > > > > > > > > > > Our setup: > > > > > > > > > > > > > > QEMU: https://gitlab.freedesktop.org/Vivek/qemu/-/commits/spice_gl_on_v2 > > > > > > > Spice: > > > > > > > https://gitlab.freedesktop.org/Vivek/spice/-/commits/encode_dmabuf_v6 > > > > > > > virt-viewer > > > > > > > Intel HW decoder/encoder (but same with sw) > > > > > > > > > > > > > > I have looked into what is causing the delay and have noticed that > > > > > > > encoding only takes about 3-4ms. In general, the image seems to reach > > > > > > > the client in less than 15ms. > > > > > > > The main problem seems to be that gstreamer gets a very high > > > > > > > margin(https://gitlab.freedesktop.org/spice/spice-gtk/-/blob/master/src/channel-display.c?ref_type=heads#L1773) and therefore waits a long time before starting decoding. And the reason for the high margin seems to be the bad mm_time_offset https://gitlab.freedesktop.org/spice/spice-gtk/-/blob/master/src/spice-session.c?ref_type=heads#L2418 which is used to offset the server time to the client time (with some margin). And this variable is set by the spice server to initially 400 ms https://gitlab.freedesktop.org/spice/spice/-/blob/master/server/reds.cpp?ref_type=heads#L3062 and gets updated with the latency https://gitlab.freedesktop.org/spice/spice/-/blob/master/server/reds.cpp?ref_type=heads#L2614 (but only increased). I still need to see how this latency is calculated. > > > > > > > > > > > > > > Am I missing something or is this design not intended for transmitting > > > > > > > interactive content via video stream? > > > > > > > Temporarily overwriting the margin and tweaking parameter settings on > > > > > > > the msdkh264dec brought the delay to about 80-100ms, which is not yet > > > > > > > optimal but usable. To see what is technical possible on my setup, I > > > > > > > made a comparison using moonlight/sunshine which resulted in an delay of > > > > > > > 20-40ms. > > > > > > > > > > > > > > Our goal is to achieve some round trip time similar to the > > > > > > > moonlight/sunshine scenario to achieve a properly usable desktop > > > > > > > experience. > > > > > > > > > > > > > > Greetings > > > > > > > Michael > > > > > > > > > > > > Greetings > > > > > > Michael > >