Re: [PATCH 8/9] worker: make sure we dispatch after releasing items

Frediano Ziglio <fziglio@xxxxxxxxxx> · Fri, 18 Dec 2015 06:10:17 -0500 (EST)

> Hi
> 
> ----- Original Message -----
> > On Wed, 2015-12-16 at 18:16 -0500, Marc-André Lureau wrote:
> > > Hi
> > > 
> > > ----- Original Message -----
> > > I actually think that was it. The worker thread may currently block
> > > because
> > > the client is too slow. But then, when we release item, the pipe is going
> > > to
> > > be smaller hopefully, and thus we can try to dispatch again without
> > > waiting
> > > for the timeout. (waiting for timers can really degrade performances in
> > > some
> > > cases, like you see today with the qemu virgl timer, but that's another
> > > story where I can show you figures when we merge gl scanounts in spice ;)
> > 
> > Then I'm still confused. In the old loop, we had a very explicit sequence
> > of
> > behavior. For each loop:
> > 
> >  1 calculate timeout for poll() function
> >  2 poll() fds with timeout from step 1
> >  3 service expired timers
> >  4 handle fd watch events
> >  5 process cursor commands from guest
> >  6 process display commands from guest
> >  7 push queued messages to clients
> >  8 GOTO 1
> > 
> > After pushing the queued messages to the client, we did not go back and
> > handle
> > cursor and display commands again. We went back to the top of the loop and
> > did a
> > poll with timeout again. So I fail to see why the glib loop needs to short
> > -circuit the timeout and return to step 5 when the old design didn't.
> > 
> > Or is this change unrelated to the new glib loop and the worker has simply
> > always been too slow even with the old loop implementation?
> > 
> 
> I wouldn't say "too slow", and I wouldn't try to mimic the old code either.
> For me the goal is simply to process the pipe/command as fast as possible
> and if possible, get rid of timeouts.

I can understand that not mimic can be better but at the moment this is causing
a bit of regressions and this patch was trying to avoid some (I think).

I looked more deeply at glib code. The sources are stored (appended) in multiple
list (each for every priority and given at the moment we are using a single
priority which was similar to former code anyway). Considering that the processing
code is registered at worker initialization and that watches are recreated when
event masks are updated this possibly lead the watched to be after the process
leading the loop to be:

 1 calculate timeout for poll() function
 2 poll() fds with timeout from step 1
 3 process cursor commands from guest
 4 process display commands from guest
 5 push queued messages to clients
 6 service expired timers
 7 handle fd watch events
 8 GOTO 1
(not sure about timers but are not that frequent).
This obviously make stuff worse as after step 7 you have to wait the poll (2)
to get commands processed again.
I'm not saying that the patch address also some other issues but that the glib code
actually make this patch more useful (that is glib have regressions).

On a more wide looking. What are this (and possibly "worker: avoid server hanging 
when no client are connected.") patch trying to fix?
Basically trying to handle slow clients. On fast clients you don't have many issues.
Pipe queue is (or should be) quite small and all commands are processed. But on slow
clients we process till the queue reach a given size. I honestly don't have all
knowledge on what happen next. When you do a push on RedChannel you try to send as
many items till network queue is full. If the client is just not processing you can
only add items to the queue. After a while the limit is reached and to avoid queuing
indefinitely you stop processing commands. Now I would have some questions:
- what happen in the guest? Just after a while the screen get no more updates?
  And if the client ask for an update?
- does item collapsing occurs only if client is not connected? I think that what should
  happen is that if there are too items items should be collapsed (reduced drawing
  on surfaces) and processing should continue.
- what happen if client keep sending request that trigger appending items but avoid
  the read? Can a DoS be triggered?

Frediano
_______________________________________________
Spice-devel mailing list
Spice-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/spice-devel