Re: [PATCH 8/9] worker: make sure we dispatch after releasing items

Frediano Ziglio <fziglio@xxxxxxxxxx> · Tue, 22 Dec 2015 11:53:57 -0500 (EST)

> 
> On Mon, 2015-12-21 at 09:54 -0600, Jonathon Jongsma wrote:
> > On Mon, 2015-12-21 at 10:17 -0500, Frediano Ziglio wrote:
> > > > 
> > > > On Fri, 2015-12-18 at 06:10 -0500, Frediano Ziglio wrote:
> > > > > > Hi
> > > > > > 
> > > > > > ----- Original Message -----
> > > > > > > On Wed, 2015-12-16 at 18:16 -0500, Marc-André Lureau wrote:
> > > > > > > > Hi
> > > > > > > > 
> > > > > > > > ----- Original Message -----
> > > > > > > > I actually think that was it. The worker thread may currently
> > > > > > > > block
> > > > > > > > because
> > > > > > > > the client is too slow. But then, when we release item, the
> > > > > > > > pipe
> > > > > > > > is
> > > > > > > > going
> > > > > > > > to
> > > > > > > > be smaller hopefully, and thus we can try to dispatch again
> > > > > > > > without
> > > > > > > > waiting
> > > > > > > > for the timeout. (waiting for timers can really degrade
> > > > > > > > performances
> > > > > > > > in
> > > > > > > > some
> > > > > > > > cases, like you see today with the qemu virgl timer, but that's
> > > > > > > > another
> > > > > > > > story where I can show you figures when we merge gl scanounts
> > > > > > > > in
> > > > > > > > spice
> > > > > > > > ;)
> > > > > > > 
> > > > > > > Then I'm still confused. In the old loop, we had a very explicit
> > > > > > > sequence
> > > > > > > of
> > > > > > > behavior. For each loop:
> > > > > > > 
> > > > > > >  1 calculate timeout for poll() function
> > > > > > >  2 poll() fds with timeout from step 1
> > > > > > >  3 service expired timers
> > > > > > >  4 handle fd watch events
> > > > > > >  5 process cursor commands from guest
> > > > > > >  6 process display commands from guest
> > > > > > >  7 push queued messages to clients
> > > > > > >  8 GOTO 1
> > > > > > > 
> > > > > > > After pushing the queued messages to the client, we did not go
> > > > > > > back
> > > > > > > and
> > > > > > > handle
> > > > > > > cursor and display commands again. We went back to the top of the
> > > > > > > loop
> > > > > > > and
> > > > > > > did a
> > > > > > > poll with timeout again. So I fail to see why the glib loop needs
> > > > > > > to
> > > > > > > short
> > > > > > > -circuit the timeout and return to step 5 when the old design
> > > > > > > didn't.
> > > > > > > 
> > > > > > > Or is this change unrelated to the new glib loop and the worker
> > > > > > > has
> > > > > > > simply
> > > > > > > always been too slow even with the old loop implementation?
> > > > > > > 
> > > > > > 
> > > > > > I wouldn't say "too slow", and I wouldn't try to mimic the old code
> > > > > > either.
> > > > > > For me the goal is simply to process the pipe/command as fast as
> > > > > > possible
> > > > > > and if possible, get rid of timeouts.
> > > > > 
> > > > > I can understand that not mimic can be better but at the moment this
> > > > > is
> > > > > causing
> > > > > a bit of regressions and this patch was trying to avoid some (I
> > > > > think).
> > > > > 
> > > > > I looked more deeply at glib code. The sources are stored (appended)
> > > > > in
> > > > > multiple
> > > > > list (each for every priority and given at the moment we are using a
> > > > > single
> > > > > priority which was similar to former code anyway). Considering that
> > > > > the
> > > > > processing
> > > > > code is registered at worker initialization and that watches are
> > > > > recreated
> > > > > when
> > > > > event masks are updated this possibly lead the watched to be after
> > > > > the
> > > > > process
> > > > > leading the loop to be:
> > > > > 
> > > > >  1 calculate timeout for poll() function
> > > > >  2 poll() fds with timeout from step 1
> > > > >  3 process cursor commands from guest
> > > > >  4 process display commands from guest
> > > > >  5 push queued messages to clients
> > > > >  6 service expired timers
> > > > >  7 handle fd watch events
> > > > >  8 GOTO 1
> > > > > (not sure about timers but are not that frequent).
> > > > > This obviously make stuff worse as after step 7 you have to wait the
> > > > > poll
> > > > > (2)
> > > > > to get commands processed again.
> > > > > I'm not saying that the patch address also some other issues but that
> > > > > the
> > > > > glib
> > > > > code
> > > > > actually make this patch more useful (that is glib have regressions).
> > > > > 
> > > > > On a more wide looking. What are this (and possibly "worker: avoid
> > > > > server
> > > > > hanging
> > > > > when no client are connected.") patch trying to fix?
> > > > > Basically trying to handle slow clients. On fast clients you don't
> > > > > have
> > > > > many
> > > > > issues.
> > > > > Pipe queue is (or should be) quite small and all commands are
> > > > > processed.
> > > > > But
> > > > > on slow
> > > > > clients we process till the queue reach a given size. I honestly
> > > > > don't
> > > > > have
> > > > > all
> > > > > knowledge on what happen next. When you do a push on RedChannel you
> > > > > try
> > > > > to
> > > > > send as
> > > > > many items till network queue is full. If the client is just not
> > > > > processing
> > > > > you can
> > > > > only add items to the queue. After a while the limit is reached and
> > > > > to
> > > > > avoid
> > > > > queuing
> > > > > indefinitely you stop processing commands. Now I would have some
> > > > > questions:
> > > > > - what happen in the guest? Just after a while the screen get no more
> > > > > updates?
> > > > >   And if the client ask for an update?
> > > > > - does item collapsing occurs only if client is not connected? I
> > > > > think
> > > > > that
> > > > > what should
> > > > >   happen is that if there are too items items should be collapsed
> > > > > (reduced
> > > > > drawing
> > > > >   on surfaces) and processing should continue.
> > > > > - what happen if client keep sending request that trigger appending
> > > > > items
> > > > > but
> > > > > avoid
> > > > >   the read? Can a DoS be triggered?
> > > > > 
> > > > > Frediano
> > > > 
> > > > 
> > > > 
> > > > So I did some very simplistic testing here. Basically I looked at the
> > > > red_process_display() function to see how often we encountered a full
> > > > pipe
> > > > queue
> > > > (since that is the situation that Marc-Andre thinks this patch is
> > > > trying
> > > > to
> > > > improve). I was curious how often this happened, so I just instrumented
> > > > the
> > > > code
> > > > to count how many times we called red_process_display() and how many of
> > > > those
> > > > times we returned from this function due to a full queue (rather than
> > > > because
> > > > there were no more commands to process). Here's the patch:
> > > > 
> > > > diff --git a/server/red-worker.c b/server/red-worker.c
> > > > index dfaf4ba..dec635d 100644
> > > > --- a/server/red-worker.c
> > > > +++ b/server/red-worker.c
> > > > @@ -229,7 +235,10 @@ static int red_process_display(RedWorker *worker,
> > > > uint32_t
> > > > max_pipe_size, int *r
> > > >      QXLCommandExt ext_cmd;
> > > >      int n = 0;
> > > >      uint64_t start = spice_get_monotonic_time_ns();
> > > > +    static int total = 0;
> > > > +    static int full = 0;
> > > >  
> > > > +    total++;
> > > >      if (!worker->running) {
> > > >          *ring_is_empty = TRUE;
> > > >          return n;
> > > > @@ -335,6 +344,9 @@ static int red_process_display(RedWorker *worker,
> > > > uint32_t
> > > > max_pipe_size, int *r
> > > >              return n;
> > > >          }
> > > >      }
> > > > +    full++;
> > > > +    fprintf(stderr, "display channel exceeded max pipe size: %i/%i
> > > > (%g)\n",
> > > > +            full, total, (double)full / total * 100.0);
> > > >      return n;
> > > >  }
> > > >  
> > > > 
> > > > This was my procedure:
> > > > 
> > > > - Boot RHEL7 guest
> > > > - wait for boot sequence to finish
> > > > - log into GNOME
> > > > - start web browser, open a page
> > > > - open another page in another tab
> > > > - open video player and play video in full screen
> > > > - enable second display
> > > > - resize second display
> > > > - disable second display
> > > > - exit video player
> > > > - shut down guest
> > > > 
> > > > I did this same test with the current master branch, with the glib loop
> > > > patch
> > > > applied, and also with this patch on top of the glib loop patch.
> > > > 
> > > > What I noticed is that during the boot sequence and the shutdown
> > > > sequence
> > > > (when
> > > > we're in a VT and text is scrolling up the screen), we hit the queue
> > > > limit
> > > > quite
> > > > frequently. In fact, nearly 30% of the time that we call
> > > > red_process_display(),
> > > > we return because our pipe queue is full. However, after X has started
> > > > and
> > > > we're
> > > > in the desktop, we never hit a full-queue condition. When I shut down
> > > > the
> > > > guest,
> > > > it briefly returns to the VT until it powers down, and we again hit
> > > > some
> > > > full
> > > > -queue conditions.
> > > > 
> > > > This was true of both the old loop and the new glib loop (with and
> > > > without
> > > > the
> > > > patch).
> > > > 
> > > > There were some differences between the old and new loop. In
> > > > particular,
> > > > the
> > > > new
> > > > glib loop called red_process_display() more times in roughly the same
> > > > elapsed
> > > > time. It also had a higher percentage of calls that returned because
> > > > the
> > > > queue
> > > > was full (but again, this did not happen during "normal" operation, but
> > > > only
> > > > during the boot and shutdown phase). The glib loop with this patch on
> > > > top
> > > > had
> > > > roughly the same percentage as the glib loop on its own, but the total
> > > > number
> > > > of
> > > > calls to red_process_display() was higher, unsurprisingly.
> > > > 
> > > > 
> > > > Here is some example output:
> > > > 
> > > > Old loop: http://ur1.ca/ocnyn
> > > > Glib loop: http://ur1.ca/ocnyp
> > > > glib loop with this patch applied: http://ur1.ca/oco1u
> > > > 
> > > > Notice that lines such as this indicate that GDM has started or I've
> > > > logged
> > > > into
> > > > GNOME:
> > > > 
> > > >    main_channel_handle_parsed: agent start
> > > > 
> > > > Then I use the desktop for about a minute before shutting down again.
> > > > No
> > > > more
> > > > log messages are printed until shutdown begins.
> > > > 
> > > >  I also did another test where I used the guest "normally" (browsing
> > > >  internet,
> > > > watching videos, writing text in libreoffice, etc) for quite a while,
> > > > and
> > > > still
> > > > never triggered the full-queue condition.
> > > > 
> > > > I'm not sure what conclusions to draw from this exactly. There is
> > > > probably
> > > > some
> > > > better way to test this. Perhaps we should test how long a pipe item
> > > > stays
> > > > in
> > > > the queue before it is sent.
> > > > 
> > > > Jonathon
> > > > 
> > > 
> > > That's great!
> > > 
> > > I'm quite scare about the loop counts using Glib. Mainly looks like loop
> > > is
> > > executed twice as much (with possible cpu usage increased).
> > 
> > Yeah, with the plain glib loop it's about 1.5x, but with the additional
> > wakeup
> > -after-releasing-a-pipe-item patch, it's closer to double.
> > 
> > 
> > > 
> > > I agree would be helpful to see the latency of the items.
> > > 
> > > Did you try increasing client latency or decreasing bandwidth?
> > > I think replay utility could be useful for these tests (didn't try).
> > > 
> > > Frediano
> > 
> > 
> > 
> > Nope, I didn't try any latency or bandwidth adjustments yet. This was just
> > a
> > first step to get a rough comparison and then figure out exactly what we
> > should
> > be testing. I haven't actually used the reply utility much yet, so I may
> > have
> > some questions about it when I start using it. But that's a good
> > suggestion.
> > 
> > Jonathon
> 
> 
> OK, here's some initial test results showing the amount of time a pipe item
> waits in the queue before being sent out. Basically I save the time when a
> pipe
> item is is added to the queue, and then check the time when that item is sent
> to
> the client. The procedure I used to test is basically the same as that
> describe
> above for my previous test. I
> 
> Old loop:
> ========
> - First run:
>     avg=1.64109ms, max=56.881ms, min=5.4e-05ms (n=85801)
> 
> - second run (with queue time distribution):
>     avg=1.66152ms, max=60.9958ms, min=4.1e-05ms (n=84801)    queue time
>     distribution: <0.1ms=3495, <0.2ms=509, <0.5ms=7035,
>     <1ms=36476, <2ms=26316, <5ms=7399, <10ms=1285, <20ms=1365, <50ms=914,
>     >50ms=7
>     	
> 
> 
> glib loop:
> =========
> - First run:
>     avg=1.78949ms, max=49.901ms, min=4.8e-05ms (n=85801)
> 
> - second run (with queue time distribution):
>     avg=1.67802ms, max=65.0128ms, min=3.9e-05ms (n=87601)
>     queue time distribution: <0.1ms=4313, <0.2ms=500, <0.5ms=6088,
>     <1ms=35567,
>     <2ms=29102, <5ms=8284, <10ms=1502, <20ms=1224, <50ms=1012, >50ms=9
> 
> 
> glib loop with this patch applied:
> =================================
> - First run:
>     avg=1.76456ms, max=65.865ms, min=4.8e-05ms (n=86601)
> 
> - second run (with queue time distribution):
>     avg=1.80822ms, max=53.4864ms, min=5.5e-05ms (n=86001)
>     queue time distribution: <0.1ms=3393, <0.2ms=500, <0.5ms=6166,
>     <1ms=31334,
>     <2ms=30842, <5ms=9671, <10ms=1636, <20ms=1414, <50ms=1037, >50ms=8
> 
> 
> As you can see, the stats vary a bit between runs. It's not obviously better
> or
> worse between the old loop and the new glib loop. The average queue times are
> just slightly higher on the glib ones, but I'd have to do more test runs to
> know
> whether that was statistically significant.
> 
> One thing I did notice was that during the (VT-based) boot phase, the max
> queue
> time tended to be only about 20ms, but once the graphical environment started
> up, I started getting more longer queue times. This could just be due to the
> increased CPU load of the host, though.
> 

I'm quite impressed that after the release patch the average was not lower!
Honestly looks like is quite hard to have consistent results.
Even using same input and removing guest cpu (using replay utility) the data
can change quite a lot. Would be possibly helpful to have a less multitasking
environment (I do the test on my laptop with all running).

> Also worth noting is that if the queue is full, we simply don't create pipe
> items. So, it's possible that qxl commands are waiting longer to be
> processed,
> but that doesn't show up on these statistics.
> 

Right, I think with replay utility taking into account that command would
be the same would make easier to understand which algorithm loose more
commands.

> 
> Somewhat unrelated question: Does anybody know how the value of 50 was chosen
> for MAX_PIPE_SIZE? If we could increase that value a bit, it would reduce the
> number of times we'd hit the full queue scenario. Of course it would also
> increase memory use, but maybe we're willing to trade some additional memory
> usage in the host to avoid blocking the guest regularly due to unhandled qxl
> commands.
> 

Yes, but it's just a workaround. No idea where the MAX_PIPE_SIZE came and
looking at git history is useless (came from first git import).

> (instrumentation patch here if you're intersted: http://ur1.ca/ocsfl)
> 

Interesting! Just I would save the total queue, not the average. Every time
you compute and save you loose a bit of precision (actually involving conversion
from gdouble to guint64 you have a truncation). But error should be quite similar
in all cases and actually should be on average n_sent / 4 nanoseconds.

Frediano
_______________________________________________
Spice-devel mailing list
Spice-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/spice-devel