Re: High delay of video-streams

Michael Scherle <michael.scherle@xxxxxxxxxxxxxxxxxx> · Tue, 18 Jun 2024 09:43:14 +0200

Hi,


Hi,
    sent my patch as MR.
https://gitlab.freedesktop.org/spice/spice/-/merge_requests/226.


thank you very much for this.
Frediano

Il giorno sab 1 giu 2024 alle ore 17:14 Frediano Ziglio
<freddy77@xxxxxxxxx> ha scritto:

Il giorno lun 27 mag 2024 alle ore 16:19 Victor Toso
<victortoso@xxxxxxxxxx> ha scritto:

Hi,

On Tue, Apr 16, 2024 at 12:59:50PM GMT, Michael Scherle wrote:
Hello,

Thanks for your changesets, they definitely reduce the delay significantly
(to a similar level as our provosoric fixes, but yours are much cleaner).

On the client side (spice-gtk) I looked at the problem with the high
decoding time (2 frames buffering) and was able to find a simple fix with
the help of the gstreamer community:

---
  src/channel-display-priv.h | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/channel-display-priv.h b/src/channel-display-priv.h
index 1a7590a..a2af1a7 100644
--- a/src/channel-display-priv.h
+++ b/src/channel-display-priv.h
@@ -177,7 +177,7 @@ static const struct {
       * (hardcoded in spice-server), let's add it here to avoid the warning.
       */
      { SPICE_DISPLAY_CAP_CODEC_H264, "h264",
-      "h264parse ! avdec_h264", "video/x-h264,stream-format=byte-stream" },
+      "h264parse ! avdec_h264",
"video/x-h264,stream-format=byte-stream,alignment=au" },

      /* SPICE_VIDEO_CODEC_TYPE_VP9 */
      { SPICE_DISPLAY_CAP_CODEC_VP9, "vp9",
@@ -185,7 +185,7 @@ static const struct {

      /* SPICE_DISPLAY_CAP_CODEC_H265 */
      { SPICE_DISPLAY_CAP_CODEC_H265, "h265",
-      "h265parse ! avdec_h265", "video/x-h265,stream-format=byte-stream" },
+      "h265parse ! avdec_h265",
"video/x-h265,stream-format=byte-stream,alignment=au" },

jfyi, this was discussed in the past. It depends how spice server
was configured too, no? I'm not sure, it has been awhile. What I
mean is, what/who is doing h264 encoding. We had a
spice-streaming-agent that wrapped guest's GPU h264 encoding and
sent to the client, with the same protocol.... depending how it
is configured, the stream-format was important I think. Again,
not 100% sure.


I think we  used the same format. I also remember that we sent an
additional NAL unit to force the "flush" so I think it's very similar,
the stream render waits the next SPICE packet as not recognizing the
frame to have ended.


  };

--
2.40.1

However, this change should probably still be tested on different setups.
Since I don't know whether they are always au aligned, I should probably
find out about that.

Also I have made other experiments, such as removing decoding_queue in
channel-display-gst.c and adding the SpiceGstFrame to the metadata of the
gstBuffer instead, as well as completely ignoring the display time of a
frame and instead displaying them immediately. With that i got down to
60-80ms delay.

If you send patches about this one, feel free to tag me. This
looks cool.

I haven't submitted any patches yet, as this was a prototype 
implementation to see if it works. I would still need to fix a few edge 
cases. I also realised that I would have to add some kind of 
jitterbuffer  again, because without buffering it only works smoothly on 
a good connection.


Do you know if your changes or similar ones that reduce the
delay will go upstream soon?

While looking through the source code, I found
SPICE_KEYPRESS_DELAY, which is not mentioned anywhere. Is this
the only use to save some network traffic? Is there any reason
not to always set this to 0 in today's network environments?
(And maybe set the default to 0?)

Introduced in c03e002152dc0c, commit log says:

  > widget: add keypress-delay property
  >
  > The delay before the press event is sent to the server if the key is
  > kept pressed. If the key is released within that time, that delay is
  > ignored and a single key-press-release event will be sent.

Introduced in 2012. I'm pretty sure there were reasons for it.
Not sure if worth to remove it.


Not much indication on why it was introduced. Beside reducing the
network packets (but not much the traffic, display traffic is way
bigger) I would suppose wonky networks. Suppose the network has quite
some weird latency patterns and you type (so press and release) "A"
key. You send a press request and a release request. But the server
receives the release after a while (say 1 second or more for
instance). This could trigger key repetition in the guest causing a
"AAA" (for instance) to be typed. Typing normally 100ms is enough to
release the key so even on wonky networks you won't hit key
repetitions due to network delays. But that's a theory. Surely if you
want to play a game this delay is not helping :-)

Cheers,
Victor

Michael


Greetings
Michael

Frediano

On 03.04.24 21:22, Frediano Ziglio wrote:
Frediano

Il giorno mar 2 apr 2024 alle ore 15:27 Michael Scherle
<michael.scherle@xxxxxxxxxxxxxxxxxx> ha scritto:

Hi Frediano,

thank you very much for your detailed answer.


On 02.04.24 14:13, Frediano Ziglio wrote:

Really short explanation: Lipsync.

Less cryptic explanation: video streaming was added much time ago when
desktops used 2D graphic drawings, like lines, fillings, strings and
so on. At that time networks were more unreliable, latency bigger, and
with high probability a continuous bitblt on the same big area was a
video playing. So the idea of detecting the video playing and
optimizing to sync audio and video was a good idea.

ok this explains a lot.

Now starts my opinionated ideas. The idea of continuous bitblt being
only a video stream is wrong, nowadays desktops do use large bitblt
for everything, or better they use 3D cards a lot and compose the
various windows on the screen which appears to us as just bitblt,
often contiguous. So the delay should just be removed optimizing for
real time video streaming. As you realize the algorithm also keeps
increasing the delay for every glitch found which is not improving the
user experience. I have different changesets removing entirely all
these delays (it's possible to get this just by changing the server
part), the result is much less delay, the audio/video sync (watching a
movie) is, with nowadays networks, acceptable.


Would it be possible to get your changesets, so that I could try them
out? I would be interested to know how this can be implemented with only
server-side changes. A dirty idea I had (and tried) would be to set the
mm_time to the past so that the client displays the image immediately,
but that would not be a good fix in my opinion.


That's the commit
https://cgit.freedesktop.org/~fziglio/spice-server/commit/?h=nvidia&id=eaaec7be80a9d402f425f7571bb27a082ebf739a.

I would rather consider it reasonable that the server timestamps the
frames (and perhaps the sound) with the encoding time and that the
client itself calculates when it wants to display them (from the diffs).
So the client could decide if it wants to display the images directly or
add some delay to compensate for network jitter (or lipsync) or maybe
even implement something like v-sync. These would of course be breaking
changes that would require changes to the client and server and would
make them incompatible with older versions. If this could not be done
directly, due to compatibility reasons, maybe this could be implemented
in a separate low latency mode or something like that (which both server
and client needs to support).


I suppose the negative time you though is something like
https://cgit.freedesktop.org/~fziglio/spice-server/commit/?h=nvidia&id=4a1a2a20505bc453f30573a0d453a9dfa1d97e7c
(which improve the previous).

Even with above ideas applied, for spice-gtk, I have noticed a high
decode delay. The gstreamer pipeline always seems to keep at least 2
frames in the pipeline (regardless of the frame rate) which increases
the delay further. Have you also noticed this? I'm currently looking
into the reason for this.

When testing stuff out we saw that Sunshine/Moonlight performed very
well in generating  a low delay and high QoE. That is kind of our
benchmark for remote access to strive for :)

Greetings
Michael


Frediano


On 15.03.24 14:08, Michael Scherle wrote:
Hello spice developers,

we are trying to develop an Open Source virtual desktop infrastructure
to be deployed at multiple German universities as described, by my
colleagues, in the paper which I have put in the attachment. The
solution based on openstack, qemu, spice... Our plan is also to have VM
instances with virtual GPUs (SR-IOV). Due to the resulting requirements,
it is necessary to transmit the image data as a video stream.
We have seen Vivek Kasireddy recent work on spice which solves exactly
this problem. However, when we tested it, we noticed a very high input
to display delay (400 ms+ but only if the image data is transferred as
video-stream). However, the problem seems to be a more general spice
problem or is there something wrong with our setup or are there special
parameters that we are missing?

Our setup:

QEMU: https://gitlab.freedesktop.org/Vivek/qemu/-/commits/spice_gl_on_v2
Spice:
https://gitlab.freedesktop.org/Vivek/spice/-/commits/encode_dmabuf_v6
virt-viewer
Intel HW decoder/encoder (but same with sw)

I have looked into what is causing the delay and have noticed that
encoding only takes about 3-4ms. In general, the image seems to reach
the client in less than 15ms.
The main problem seems to be that gstreamer gets a very high
margin(https://gitlab.freedesktop.org/spice/spice-gtk/-/blob/master/src/channel-display.c?ref_type=heads#L1773) and therefore waits a long time before starting decoding. And the reason for the high margin seems to be the bad mm_time_offset https://gitlab.freedesktop.org/spice/spice-gtk/-/blob/master/src/spice-session.c?ref_type=heads#L2418 which is used to offset the server time to the client time (with some margin). And this variable is set by the spice server to initially 400 ms https://gitlab.freedesktop.org/spice/spice/-/blob/master/server/reds.cpp?ref_type=heads#L3062 and gets updated with the latency https://gitlab.freedesktop.org/spice/spice/-/blob/master/server/reds.cpp?ref_type=heads#L2614 (but only increased). I still need to see how this latency is calculated.

Am I missing something or is this design not intended for transmitting
interactive content via video stream?
Temporarily overwriting the margin and tweaking parameter settings on
the msdkh264dec brought the delay to about 80-100ms, which is not yet
optimal but usable. To see what is technical possible on my setup, I
made a comparison using moonlight/sunshine which resulted in an delay of
20-40ms.

Our goal is to achieve some round trip time similar to the
moonlight/sunshine scenario to achieve a properly usable desktop
experience.

Greetings
Michael

Greetings
Michael