Re: [spice-gtk] main: Don't delay update_display_timer(0) for up to 1 second

Jonathon Jongsma <jjongsma@xxxxxxxxxx> · Wed, 01 Jun 2016 15:06:53 -0500

On Wed, 2016-06-01 at 15:47 +0200, Pavel Grunt wrote:
> Hi Christophe,
> 
> On Wed, 2016-05-25 at 18:14 +0200, Christophe Fergeau wrote:
> > 
> > When using remote-viewer --full-screen with a VM/client with multiple
> > monitors, a race can be observed during auto-configuration. First, the
> > client monitors config is sent to the guest:
> > (remote-viewer:19480): GSpice-DEBUG: channel-main.c:1166 main-1:0: sending
> > new
> > monitors config to guest
> > (remote-viewer:19480): GSpice-DEBUG: channel-main.c:1183 main-1:0: monitor
> > #0:
> > 1920x1080+0+0 @ 32 bpp
> > (remote-viewer:19480): GSpice-DEBUG: channel-main.c:1183 main-1:0: monitor
> > #1:
> > 1920x1080+1920+0 @ 32 bpp
> > 
> > Then we receive messages from the agent which trigger a call to
> > update_display_timer(0)
> > 
> > This should cause the current monitors state to be sent to the server
> > very soon after this call. However, in the racy case, this is delayed
> > for nearly a second, and timer_set_display() ends up being called at the
> > wrong time, when information about the first display channel have been
> > received from the server, but not the second one.
> this is strange

Perhaps, but there's an inherent race here that's quite hard to avoid. Under
normal circumstances, when virt-viewer starts up, it waits for the server to
tell it the display configuration so it knows how many windows to create. The
server's behavior is to send a MonitorsConfig message as soon as the display
channel is created, so under ideal network conditions it should be received very
soon.

When starting in fullscreen mode, virt-viewer attempts to configure the remote
guest to have the same number of monitors as the local client machine and
configure them to be same size as the client monitors. It does this by sending a
MonitorsConfig message immediately after the Main channel indicates that the
agent is connected. It does not wait to receive the display configuration from
the server since we don't want to use the current guest configuration. For
example, if the guest currently had 4 display enabled and we had 2 client
monitors, we wouldn't want to open 4 windows and then suddenly close 2 of them.

So, since the client and the server are both attempting to send configuration
messages at startup, and they're doing it via different channels (Main and
Display), it's quite difficult to synchronize these two events and avoid a race.
See below for more detail on the difficulties here.

> > 
> > 
> > When this happens, we inform the server that only one monitor is active,
> > then the server sends a MonitorsConfig message with 2 monitors (first
> > request we sent), and then with just 1 monitor (second request we sent).
> Is the server doing everything correctly ?

It is behaving as designed, yes. The only requirements for the server are A) to
send a MonitorsConfig message when a client connects, and B) to send a
MonitorsConfig message when the QXL driver indicates that the display
configuration has changed.

If the client sends a 2-monitor config followed by a 1-monitor config, these
messages will probably result in the desktop re-configuring the desktop to 2
monitors (which triggers a new MonitorsConfig message as explained above) and
then the desktop re-configuring the desktop to 1 monitor (which again triggers a
new MonitorsConfig message).

> > 
> > 
> > This causes remote-viewer to show one fullscreen black window indicating
> > "Connected to server" on one monitor and a working fullscreen window on
> > the second monitor.
> > 
> > update_display_timer(0) schedules a timeout to be run with
> > g_timeout_add_seconds(0). However, g_timeout_add_seconds() schedules
> > timers with a granularity of a second, so the timeout we scheduled with
> > g_timeout_add_seconds() may fire up to 1 second later than when we
> > called it, while we really wanted it to fire as soon as possible.
> > Special-casing update_display_timer(0) and using g_timeout_add()
> What about only using g_timeout_add() ?
> 
> > 
> >  in that
> > case avoid this issue. In theory, the race could probably still happen
> > with a very very bad timing, 
> I think we should focus on the race (and the behaviour of the server ?).
> 
> So if I understand it correctly, the client sends a resize request, but at the
> same time the server sends info about monitors, and this causes the problem.
> What about waiting with the resize request till we know that display channels
> are ready.

I've thought about doing this in the past, but unfortunately it's not very easy.
For Linux guests it's not so difficult since they only have a single display
channel that can support N monitors. So we receive a single MonitorsConfig
message from the server at startup indicating how many of those N displays are
enabled. But Windows guests have a separate display channel for each monitor. So
each enabled monitor will send out its own MonitorsConfig message at startup,
but disabled monitors will not send anything. So if we receive a single
MonitorsConfig message at startup, does that mean that only one monitor is
enabled, or does it mean that the other messages simply haven't arrived yet?
It's not currently possible to know that. You could set a timer and assume that
if we don't receive any more MonitorsConfig messages within M seconds after
connection, those displays that haven't sent a MonitorsConfig message are
disabled. But that has a couple problems: A) it doesn't solve the race, it just
pushes it out to M seconds after startup, and B) it delays startup since we will
have to wait M seconds before configuring fullscreen monitors. If we try to
reduce M to a smaller value to reduce startup delay, it will increase the chance
of a race condition. Not to mention that low-bandwidth or high-latency
connections may prevent these messages from arriving before M seconds at all...

> 
> I have a feeling that would be nice to have some mechanism to confirm that the
> resize request was processed by the server. And I also have a feeling that we
> have discussed.

Yeah, that could make things simpler, but I think it would require a significant
redesign in the server. Right now, there's no correlation between resize
requests received by the server and MonitorsConfig messages sent by the server.
As mentioned above, a MonitorsConfig message is only sent in response to an
actual configuration change signaled by the guest driver. This configuration
change can be triggered by a resize request from the client, but there's no way
to correlate a request with a response. A single resize request may trigger 0,
1, 2, or more MonitorsConfig messages sent from the server.

Another thing that could potentially help is a server message indicating that
"startup" is complete, or a guarantee that at startup all display channels will
send out a MonitorsConfig message (with a 0x0 size perhaps) even if the monitor
is disabled. But backwards compatibility with older servers must also be
considered.

So, in summary: I don't see an easy way to fully solve this race condition, but
this patch should improve things somewhat. So I'm inclined to ACK it.

> 
> Pavel
> 
> > 
> > but in practice I don't think it will be
> > possible to trigger it after this change.
> > 
> > https://bugzilla.redhat.com/show_bug.cgi?id=1323092
> > ---
> >  src/channel-main.c | 11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> > 
> > diff --git a/src/channel-main.c b/src/channel-main.c
> > index e5a70af..946b280 100644
> > --- a/src/channel-main.c
> > +++ b/src/channel-main.c
> > @@ -1560,7 +1560,16 @@ static void update_display_timer(SpiceMainChannel
> > *channel, guint seconds)
> >      if (c->timer_id)
> >          g_source_remove(c->timer_id);
> >  
> > -    c->timer_id = g_timeout_add_seconds(seconds, timer_set_display,
> > channel);
> > +    if (seconds != 0) {
> > +        c->timer_id = g_timeout_add_seconds(seconds, timer_set_display,
> > channel);
> > +    } else {
> > +        /* We need to special case 0, as we want the callback to fire as
> > soon
> > +         * as possible. g_timeout_add_seconds(0) would set up a timer which
> > would fire
> > +         * at the next second boundary, which might be nearly 1 full second
> > later.
> > +         */
> > +        c->timer_id = g_timeout_add(0, timer_set_display, channel);
> > +    }
> > +
> >  }
> >  
> >  /* coroutine context  */
_______________________________________________
Spice-devel mailing list
Spice-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/spice-devel