On Tue, 2015-12-22 at 11:53 -0500, Frediano Ziglio wrote: > > > > On Mon, 2015-12-21 at 09:54 -0600, Jonathon Jongsma wrote: > > > On Mon, 2015-12-21 at 10:17 -0500, Frediano Ziglio wrote: > > > > > > > > > > On Fri, 2015-12-18 at 06:10 -0500, Frediano Ziglio wrote: > > > > > > > Hi > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > On Wed, 2015-12-16 at 18:16 -0500, Marc-André Lureau wrote: > > > > > > > > > Hi > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > I actually think that was it. The worker thread may currently > > > > > > > > > block > > > > > > > > > because > > > > > > > > > the client is too slow. But then, when we release item, the > > > > > > > > > pipe > > > > > > > > > is > > > > > > > > > going > > > > > > > > > to > > > > > > > > > be smaller hopefully, and thus we can try to dispatch again > > > > > > > > > without > > > > > > > > > waiting > > > > > > > > > for the timeout. (waiting for timers can really degrade > > > > > > > > > performances > > > > > > > > > in > > > > > > > > > some > > > > > > > > > cases, like you see today with the qemu virgl timer, but > > > > > > > > > that's > > > > > > > > > another > > > > > > > > > story where I can show you figures when we merge gl scanounts > > > > > > > > > in > > > > > > > > > spice > > > > > > > > > ;) > > > > > > > > > > > > > > > > Then I'm still confused. In the old loop, we had a very explicit > > > > > > > > sequence > > > > > > > > of > > > > > > > > behavior. For each loop: > > > > > > > > > > > > > > > > 1 calculate timeout for poll() function > > > > > > > > 2 poll() fds with timeout from step 1 > > > > > > > > 3 service expired timers > > > > > > > > 4 handle fd watch events > > > > > > > > 5 process cursor commands from guest > > > > > > > > 6 process display commands from guest > > > > > > > > 7 push queued messages to clients > > > > > > > > 8 GOTO 1 > > > > > > > > > > > > > > > > After pushing the queued messages to the client, we did not go > > > > > > > > back > > > > > > > > and > > > > > > > > handle > > > > > > > > cursor and display commands again. We went back to the top of > > > > > > > > the > > > > > > > > loop > > > > > > > > and > > > > > > > > did a > > > > > > > > poll with timeout again. So I fail to see why the glib loop > > > > > > > > needs > > > > > > > > to > > > > > > > > short > > > > > > > > -circuit the timeout and return to step 5 when the old design > > > > > > > > didn't. > > > > > > > > > > > > > > > > Or is this change unrelated to the new glib loop and the worker > > > > > > > > has > > > > > > > > simply > > > > > > > > always been too slow even with the old loop implementation? > > > > > > > > > > > > > > > > > > > > > > I wouldn't say "too slow", and I wouldn't try to mimic the old > > > > > > > code > > > > > > > either. > > > > > > > For me the goal is simply to process the pipe/command as fast as > > > > > > > possible > > > > > > > and if possible, get rid of timeouts. > > > > > > > > > > > > I can understand that not mimic can be better but at the moment this > > > > > > is > > > > > > causing > > > > > > a bit of regressions and this patch was trying to avoid some (I > > > > > > think). > > > > > > > > > > > > I looked more deeply at glib code. The sources are stored (appended) > > > > > > in > > > > > > multiple > > > > > > list (each for every priority and given at the moment we are using a > > > > > > single > > > > > > priority which was similar to former code anyway). Considering that > > > > > > the > > > > > > processing > > > > > > code is registered at worker initialization and that watches are > > > > > > recreated > > > > > > when > > > > > > event masks are updated this possibly lead the watched to be after > > > > > > the > > > > > > process > > > > > > leading the loop to be: > > > > > > > > > > > > 1 calculate timeout for poll() function > > > > > > 2 poll() fds with timeout from step 1 > > > > > > 3 process cursor commands from guest > > > > > > 4 process display commands from guest > > > > > > 5 push queued messages to clients > > > > > > 6 service expired timers > > > > > > 7 handle fd watch events > > > > > > 8 GOTO 1 > > > > > > (not sure about timers but are not that frequent). > > > > > > This obviously make stuff worse as after step 7 you have to wait the > > > > > > poll > > > > > > (2) > > > > > > to get commands processed again. > > > > > > I'm not saying that the patch address also some other issues but > > > > > > that > > > > > > the > > > > > > glib > > > > > > code > > > > > > actually make this patch more useful (that is glib have > > > > > > regressions). > > > > > > > > > > > > On a more wide looking. What are this (and possibly "worker: avoid > > > > > > server > > > > > > hanging > > > > > > when no client are connected.") patch trying to fix? > > > > > > Basically trying to handle slow clients. On fast clients you don't > > > > > > have > > > > > > many > > > > > > issues. > > > > > > Pipe queue is (or should be) quite small and all commands are > > > > > > processed. > > > > > > But > > > > > > on slow > > > > > > clients we process till the queue reach a given size. I honestly > > > > > > don't > > > > > > have > > > > > > all > > > > > > knowledge on what happen next. When you do a push on RedChannel you > > > > > > try > > > > > > to > > > > > > send as > > > > > > many items till network queue is full. If the client is just not > > > > > > processing > > > > > > you can > > > > > > only add items to the queue. After a while the limit is reached and > > > > > > to > > > > > > avoid > > > > > > queuing > > > > > > indefinitely you stop processing commands. Now I would have some > > > > > > questions: > > > > > > - what happen in the guest? Just after a while the screen get no > > > > > > more > > > > > > updates? > > > > > > And if the client ask for an update? > > > > > > - does item collapsing occurs only if client is not connected? I > > > > > > think > > > > > > that > > > > > > what should > > > > > > happen is that if there are too items items should be collapsed > > > > > > (reduced > > > > > > drawing > > > > > > on surfaces) and processing should continue. > > > > > > - what happen if client keep sending request that trigger appending > > > > > > items > > > > > > but > > > > > > avoid > > > > > > the read? Can a DoS be triggered? > > > > > > > > > > > > Frediano > > > > > > > > > > > > > > > > > > > > So I did some very simplistic testing here. Basically I looked at the > > > > > red_process_display() function to see how often we encountered a full > > > > > pipe > > > > > queue > > > > > (since that is the situation that Marc-Andre thinks this patch is > > > > > trying > > > > > to > > > > > improve). I was curious how often this happened, so I just > > > > > instrumented > > > > > the > > > > > code > > > > > to count how many times we called red_process_display() and how many > > > > > of > > > > > those > > > > > times we returned from this function due to a full queue (rather than > > > > > because > > > > > there were no more commands to process). Here's the patch: > > > > > > > > > > diff --git a/server/red-worker.c b/server/red-worker.c > > > > > index dfaf4ba..dec635d 100644 > > > > > --- a/server/red-worker.c > > > > > +++ b/server/red-worker.c > > > > > @@ -229,7 +235,10 @@ static int red_process_display(RedWorker *worker, > > > > > uint32_t > > > > > max_pipe_size, int *r > > > > > QXLCommandExt ext_cmd; > > > > > int n = 0; > > > > > uint64_t start = spice_get_monotonic_time_ns(); > > > > > + static int total = 0; > > > > > + static int full = 0; > > > > > > > > > > + total++; > > > > > if (!worker->running) { > > > > > *ring_is_empty = TRUE; > > > > > return n; > > > > > @@ -335,6 +344,9 @@ static int red_process_display(RedWorker *worker, > > > > > uint32_t > > > > > max_pipe_size, int *r > > > > > return n; > > > > > } > > > > > } > > > > > + full++; > > > > > + fprintf(stderr, "display channel exceeded max pipe size: %i/%i > > > > > (%g)\n", > > > > > + full, total, (double)full / total * 100.0); > > > > > return n; > > > > > } > > > > > > > > > > > > > > > This was my procedure: > > > > > > > > > > - Boot RHEL7 guest > > > > > - wait for boot sequence to finish > > > > > - log into GNOME > > > > > - start web browser, open a page > > > > > - open another page in another tab > > > > > - open video player and play video in full screen > > > > > - enable second display > > > > > - resize second display > > > > > - disable second display > > > > > - exit video player > > > > > - shut down guest > > > > > > > > > > I did this same test with the current master branch, with the glib > > > > > loop > > > > > patch > > > > > applied, and also with this patch on top of the glib loop patch. > > > > > > > > > > What I noticed is that during the boot sequence and the shutdown > > > > > sequence > > > > > (when > > > > > we're in a VT and text is scrolling up the screen), we hit the queue > > > > > limit > > > > > quite > > > > > frequently. In fact, nearly 30% of the time that we call > > > > > red_process_display(), > > > > > we return because our pipe queue is full. However, after X has started > > > > > and > > > > > we're > > > > > in the desktop, we never hit a full-queue condition. When I shut down > > > > > the > > > > > guest, > > > > > it briefly returns to the VT until it powers down, and we again hit > > > > > some > > > > > full > > > > > -queue conditions. > > > > > > > > > > This was true of both the old loop and the new glib loop (with and > > > > > without > > > > > the > > > > > patch). > > > > > > > > > > There were some differences between the old and new loop. In > > > > > particular, > > > > > the > > > > > new > > > > > glib loop called red_process_display() more times in roughly the same > > > > > elapsed > > > > > time. It also had a higher percentage of calls that returned because > > > > > the > > > > > queue > > > > > was full (but again, this did not happen during "normal" operation, > > > > > but > > > > > only > > > > > during the boot and shutdown phase). The glib loop with this patch on > > > > > top > > > > > had > > > > > roughly the same percentage as the glib loop on its own, but the total > > > > > number > > > > > of > > > > > calls to red_process_display() was higher, unsurprisingly. > > > > > > > > > > > > > > > Here is some example output: > > > > > > > > > > Old loop: http://ur1.ca/ocnyn > > > > > Glib loop: http://ur1.ca/ocnyp > > > > > glib loop with this patch applied: http://ur1.ca/oco1u > > > > > > > > > > Notice that lines such as this indicate that GDM has started or I've > > > > > logged > > > > > into > > > > > GNOME: > > > > > > > > > > main_channel_handle_parsed: agent start > > > > > > > > > > Then I use the desktop for about a minute before shutting down again. > > > > > No > > > > > more > > > > > log messages are printed until shutdown begins. > > > > > > > > > > I also did another test where I used the guest "normally" (browsing > > > > > internet, > > > > > watching videos, writing text in libreoffice, etc) for quite a while, > > > > > and > > > > > still > > > > > never triggered the full-queue condition. > > > > > > > > > > I'm not sure what conclusions to draw from this exactly. There is > > > > > probably > > > > > some > > > > > better way to test this. Perhaps we should test how long a pipe item > > > > > stays > > > > > in > > > > > the queue before it is sent. > > > > > > > > > > Jonathon > > > > > > > > > > > > > That's great! > > > > > > > > I'm quite scare about the loop counts using Glib. Mainly looks like loop > > > > is > > > > executed twice as much (with possible cpu usage increased). > > > > > > Yeah, with the plain glib loop it's about 1.5x, but with the additional > > > wakeup > > > -after-releasing-a-pipe-item patch, it's closer to double. > > > > > > > > > > > > > > I agree would be helpful to see the latency of the items. > > > > > > > > Did you try increasing client latency or decreasing bandwidth? > > > > I think replay utility could be useful for these tests (didn't try). > > > > > > > > Frediano > > > > > > > > > > > > Nope, I didn't try any latency or bandwidth adjustments yet. This was just > > > a > > > first step to get a rough comparison and then figure out exactly what we > > > should > > > be testing. I haven't actually used the reply utility much yet, so I may > > > have > > > some questions about it when I start using it. But that's a good > > > suggestion. > > > > > > Jonathon > > > > > > OK, here's some initial test results showing the amount of time a pipe item > > waits in the queue before being sent out. Basically I save the time when a > > pipe > > item is is added to the queue, and then check the time when that item is > > sent > > to > > the client. The procedure I used to test is basically the same as that > > describe > > above for my previous test. I > > > > Old loop: > > ======== > > - First run: > > avg=1.64109ms, max=56.881ms, min=5.4e-05ms (n=85801) > > > > - second run (with queue time distribution): > > avg=1.66152ms, max=60.9958ms, min=4.1e-05ms (n=84801) queue time > > distribution: <0.1ms=3495, <0.2ms=509, <0.5ms=7035, > > <1ms=36476, <2ms=26316, <5ms=7399, <10ms=1285, <20ms=1365, <50ms=914, > > >50ms=7 > > > > > > > > glib loop: > > ========= > > - First run: > > avg=1.78949ms, max=49.901ms, min=4.8e-05ms (n=85801) > > > > - second run (with queue time distribution): > > avg=1.67802ms, max=65.0128ms, min=3.9e-05ms (n=87601) > > queue time distribution: <0.1ms=4313, <0.2ms=500, <0.5ms=6088, > > <1ms=35567, > > <2ms=29102, <5ms=8284, <10ms=1502, <20ms=1224, <50ms=1012, >50ms=9 > > > > > > glib loop with this patch applied: > > ================================= > > - First run: > > avg=1.76456ms, max=65.865ms, min=4.8e-05ms (n=86601) > > > > - second run (with queue time distribution): > > avg=1.80822ms, max=53.4864ms, min=5.5e-05ms (n=86001) > > queue time distribution: <0.1ms=3393, <0.2ms=500, <0.5ms=6166, > > <1ms=31334, > > <2ms=30842, <5ms=9671, <10ms=1636, <20ms=1414, <50ms=1037, >50ms=8 > > > > > > As you can see, the stats vary a bit between runs. It's not obviously better > > or > > worse between the old loop and the new glib loop. The average queue times > > are > > just slightly higher on the glib ones, but I'd have to do more test runs to > > know > > whether that was statistically significant. > > > > One thing I did notice was that during the (VT-based) boot phase, the max > > queue > > time tended to be only about 20ms, but once the graphical environment > > started > > up, I started getting more longer queue times. This could just be due to the > > increased CPU load of the host, though. > > > > I'm quite impressed that after the release patch the average was not lower! > Honestly looks like is quite hard to have consistent results. > Even using same input and removing guest cpu (using replay utility) the data > can change quite a lot. Would be possibly helpful to have a less multitasking > environment (I do the test on my laptop with all running). > > > Also worth noting is that if the queue is full, we simply don't create pipe > > items. So, it's possible that qxl commands are waiting longer to be > > processed, > > but that doesn't show up on these statistics. > > > > Right, I think with replay utility taking into account that command would > be the same would make easier to understand which algorithm loose more > commands. > > > > > Somewhat unrelated question: Does anybody know how the value of 50 was > > chosen > > for MAX_PIPE_SIZE? If we could increase that value a bit, it would reduce > > the > > number of times we'd hit the full queue scenario. Of course it would also > > increase memory use, but maybe we're willing to trade some additional memory > > usage in the host to avoid blocking the guest regularly due to unhandled qxl > > commands. > > > > Yes, but it's just a workaround. No idea where the MAX_PIPE_SIZE came and > looking at git history is useless (came from first git import). > > > (instrumentation patch here if you're intersted: http://ur1.ca/ocsfl) > > > > Interesting! Just I would save the total queue, not the average. Every time > you compute and save you loose a bit of precision (actually involving > conversion > from gdouble to guint64 you have a truncation). But error should be quite > similar > in all cases and actually should be on average n_sent / 4 nanoseconds. Oops. it certainly would be a better idea to avoid losing precision like that. Not sure what I was thinking. But as you say, even if the numbers aren't accurate they should at least be somewhat comparable. So I recorded a spice session and used the replay utility to compare these patches yet again. I did six runs of each and averaged the stats from the six runs to come up with a slightly more consistent number for each scenario. Here are the results: Old Loop: Avg queue time: 4.750275ms Max queue time: 177.0363333333ms <0.1ms: 1654.1666666667 <0.2ms: 1040.3333333333 <0.5ms: 14759.3333333333 <1ms: 44561.6666666667 <2ms: 11806.6666666667 <5ms: 4091 <10ms: 1045.6666666667 <20ms: 1026 <50ms: 3006.3333333333 >50ms: 2843.1666666667 Glib Loop: Avg queue time: 4.6821966667 Max queue time: 178.8768333333 <0.1ms: 1832.6666666667 <0.2ms: 753.3333333333 <0.5ms: 12149.5 <1ms: 41183.1666666667 <2ms: 17457.6666666667 <5ms: 4774 <10ms: 1084.8333333333 <20ms: 1057.3333333333 <50ms: 2816.5 >50ms: 2692 Glib Loop with patch: Avg queue time: 5.6059783333 Max queue time: 195.8168333333 <0.1ms: 2083.5 <0.2ms: 710 <0.5ms: 9365.3333333333 <1ms: 37973.8333333333 <2ms: 19989.6666666667 <5ms: 7505.6666666667 <10ms: 1193.5 <20ms: 920.3333333333 <50ms: 2540.3333333333 >50ms: 3485.5 The old loop and the glib loop are actually quite comparable. The glib loop with the patch seems to be significantly worse in average queue time. _______________________________________________ Spice-devel mailing list Spice-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/spice-devel