> > On Fri, 2015-12-18 at 06:10 -0500, Frediano Ziglio wrote: > > > Hi > > > > > > ----- Original Message ----- > > > > On Wed, 2015-12-16 at 18:16 -0500, Marc-André Lureau wrote: > > > > > Hi > > > > > > > > > > ----- Original Message ----- > > > > > I actually think that was it. The worker thread may currently block > > > > > because > > > > > the client is too slow. But then, when we release item, the pipe is > > > > > going > > > > > to > > > > > be smaller hopefully, and thus we can try to dispatch again without > > > > > waiting > > > > > for the timeout. (waiting for timers can really degrade performances > > > > > in > > > > > some > > > > > cases, like you see today with the qemu virgl timer, but that's > > > > > another > > > > > story where I can show you figures when we merge gl scanounts in > > > > > spice > > > > > ;) > > > > > > > > Then I'm still confused. In the old loop, we had a very explicit > > > > sequence > > > > of > > > > behavior. For each loop: > > > > > > > > 1 calculate timeout for poll() function > > > > 2 poll() fds with timeout from step 1 > > > > 3 service expired timers > > > > 4 handle fd watch events > > > > 5 process cursor commands from guest > > > > 6 process display commands from guest > > > > 7 push queued messages to clients > > > > 8 GOTO 1 > > > > > > > > After pushing the queued messages to the client, we did not go back and > > > > handle > > > > cursor and display commands again. We went back to the top of the loop > > > > and > > > > did a > > > > poll with timeout again. So I fail to see why the glib loop needs to > > > > short > > > > -circuit the timeout and return to step 5 when the old design didn't. > > > > > > > > Or is this change unrelated to the new glib loop and the worker has > > > > simply > > > > always been too slow even with the old loop implementation? > > > > > > > > > > I wouldn't say "too slow", and I wouldn't try to mimic the old code > > > either. > > > For me the goal is simply to process the pipe/command as fast as possible > > > and if possible, get rid of timeouts. > > > > I can understand that not mimic can be better but at the moment this is > > causing > > a bit of regressions and this patch was trying to avoid some (I think). > > > > I looked more deeply at glib code. The sources are stored (appended) in > > multiple > > list (each for every priority and given at the moment we are using a single > > priority which was similar to former code anyway). Considering that the > > processing > > code is registered at worker initialization and that watches are recreated > > when > > event masks are updated this possibly lead the watched to be after the > > process > > leading the loop to be: > > > > 1 calculate timeout for poll() function > > 2 poll() fds with timeout from step 1 > > 3 process cursor commands from guest > > 4 process display commands from guest > > 5 push queued messages to clients > > 6 service expired timers > > 7 handle fd watch events > > 8 GOTO 1 > > (not sure about timers but are not that frequent). > > This obviously make stuff worse as after step 7 you have to wait the poll > > (2) > > to get commands processed again. > > I'm not saying that the patch address also some other issues but that the > > glib > > code > > actually make this patch more useful (that is glib have regressions). > > > > On a more wide looking. What are this (and possibly "worker: avoid server > > hanging > > when no client are connected.") patch trying to fix? > > Basically trying to handle slow clients. On fast clients you don't have > > many > > issues. > > Pipe queue is (or should be) quite small and all commands are processed. > > But > > on slow > > clients we process till the queue reach a given size. I honestly don't have > > all > > knowledge on what happen next. When you do a push on RedChannel you try to > > send as > > many items till network queue is full. If the client is just not processing > > you can > > only add items to the queue. After a while the limit is reached and to > > avoid > > queuing > > indefinitely you stop processing commands. Now I would have some questions: > > - what happen in the guest? Just after a while the screen get no more > > updates? > > And if the client ask for an update? > > - does item collapsing occurs only if client is not connected? I think that > > what should > > happen is that if there are too items items should be collapsed (reduced > > drawing > > on surfaces) and processing should continue. > > - what happen if client keep sending request that trigger appending items > > but > > avoid > > the read? Can a DoS be triggered? > > > > Frediano > > > > So I did some very simplistic testing here. Basically I looked at the > red_process_display() function to see how often we encountered a full pipe > queue > (since that is the situation that Marc-Andre thinks this patch is trying to > improve). I was curious how often this happened, so I just instrumented the > code > to count how many times we called red_process_display() and how many of those > times we returned from this function due to a full queue (rather than because > there were no more commands to process). Here's the patch: > > diff --git a/server/red-worker.c b/server/red-worker.c > index dfaf4ba..dec635d 100644 > --- a/server/red-worker.c > +++ b/server/red-worker.c > @@ -229,7 +235,10 @@ static int red_process_display(RedWorker *worker, > uint32_t > max_pipe_size, int *r > QXLCommandExt ext_cmd; > int n = 0; > uint64_t start = spice_get_monotonic_time_ns(); > + static int total = 0; > + static int full = 0; > > + total++; > if (!worker->running) { > *ring_is_empty = TRUE; > return n; > @@ -335,6 +344,9 @@ static int red_process_display(RedWorker *worker, > uint32_t > max_pipe_size, int *r > return n; > } > } > + full++; > + fprintf(stderr, "display channel exceeded max pipe size: %i/%i (%g)\n", > + full, total, (double)full / total * 100.0); > return n; > } > > > This was my procedure: > > - Boot RHEL7 guest > - wait for boot sequence to finish > - log into GNOME > - start web browser, open a page > - open another page in another tab > - open video player and play video in full screen > - enable second display > - resize second display > - disable second display > - exit video player > - shut down guest > > I did this same test with the current master branch, with the glib loop patch > applied, and also with this patch on top of the glib loop patch. > > What I noticed is that during the boot sequence and the shutdown sequence > (when > we're in a VT and text is scrolling up the screen), we hit the queue limit > quite > frequently. In fact, nearly 30% of the time that we call > red_process_display(), > we return because our pipe queue is full. However, after X has started and > we're > in the desktop, we never hit a full-queue condition. When I shut down the > guest, > it briefly returns to the VT until it powers down, and we again hit some full > -queue conditions. > > This was true of both the old loop and the new glib loop (with and without > the > patch). > > There were some differences between the old and new loop. In particular, the > new > glib loop called red_process_display() more times in roughly the same elapsed > time. It also had a higher percentage of calls that returned because the > queue > was full (but again, this did not happen during "normal" operation, but only > during the boot and shutdown phase). The glib loop with this patch on top had > roughly the same percentage as the glib loop on its own, but the total number > of > calls to red_process_display() was higher, unsurprisingly. > > > Here is some example output: > > Old loop: http://ur1.ca/ocnyn > Glib loop: http://ur1.ca/ocnyp > glib loop with this patch applied: http://ur1.ca/oco1u > > Notice that lines such as this indicate that GDM has started or I've logged > into > GNOME: > > main_channel_handle_parsed: agent start > > Then I use the desktop for about a minute before shutting down again. No more > log messages are printed until shutdown begins. > > I also did another test where I used the guest "normally" (browsing > internet, > watching videos, writing text in libreoffice, etc) for quite a while, and > still > never triggered the full-queue condition. > > I'm not sure what conclusions to draw from this exactly. There is probably > some > better way to test this. Perhaps we should test how long a pipe item stays in > the queue before it is sent. > > Jonathon > That's great! I'm quite scare about the loop counts using Glib. Mainly looks like loop is executed twice as much (with possible cpu usage increased). I agree would be helpful to see the latency of the items. Did you try increasing client latency or decreasing bandwidth? I think replay utility could be useful for these tests (didn't try). Frediano _______________________________________________ Spice-devel mailing list Spice-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/spice-devel