Performance of Xspice - some results, and a potential patch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I've spent several weeks analyzing the network performance of Xspice
against two test cases.  I also crafted a patch which implements an
alternate mode for the xf86-video-qxl driver that dramatically improves
network performance.

The two test cases are simple [1]; a script drives either Libre Office
or Microsoft Office 2007.  They have a few minutes of typing, a bit of
graphics (the test case draws a smiley face), and a little bit of
interaction with the UI.

For now, I'm focused on measuring network performance; I plan to later
do some analysis to try to get at the quality of the user experience as
well.  Network performance, particularly the total # of packets, has a
dramatic effect on Spice performance over a remote network.

For a test involving 5 minutes of light use of LibreOffice, the results
are as follows:

		 Packets	  Bytes
Xspice		148,428		19,647,168
Tight VNC	 19,980		 4,724,880
SSH -X		 48,894 	77,733,200
SSH -CX		 21,445		 5,096,702

You can see that Xspice network performance is pretty bad.

This is partly by design; Spice is designed to replicate the philosophy
of the X server, wherein graphics operations themselves - not pixmaps -
are transmitted across the wire.

Further, Spice as a whole is really built around qemu and Windows qxl
running on a local LAN; Xspice is really just an experimental feature,
and I'm pretty much the first person to examine it's network flow.

However, if I modify the xf86-video-qxl driver to shift modes into a
render and send mode, whereby only changed pixmaps are sent across the
wire (and that on a periodic basis), I get dramatically better results.
 That patch is attached; these are the results:

Xspice + deferred_fps  14,910 	2,971,886

As you can see, with that change, Spice vaults into the lead.

The numbers for Office 2007 are somewhat similar, although ssh -CX does
not fare nearly as well.  (The underlying behaviors are quite different;
Office 2007 is surface + render heavy; Libre Office is fill region heavy).

Additionally, I spent a lot of energy trying to decide if this mode
could be injected into the Spice server itself, where it could
potentially benefit the whole stack.  I'm grateful to Alon and Soren for
helping me privately with that effort.  However, in the end, I could not
see a path to do this with the kind of tight efficiency that I think is
required.  I had a test case where an ill behaved X application
(fluxbox) would do 27,000 pixel draws; and I just couldn't find a way to
run those draws through the Spice server efficiently.  My old nemesis,
red_worker.c, defeats me again :-/.

I also spent some energy experimenting with modes where we kicked the
spice server into a video streaming mode.  I had only limited success
with that.  But I did do some analysis that shows that the test case
(Libre Office) requires about 3,000,000 bytes to stream a high quality
Theora video.  Given that the deferred_fps patch matches that, and it is
much lighter on the CPU, I've decided to not pursue that approach for now.

At any rate, I would appreciate any comments or feedback on the basic
approach outlined by the patch.

Cheers,

Jeremy


[1] The tests are repeatable, automated scripts, and can be found here:
    git://people.freedesktop.org/~jwhite/spice-measure

They are not at all user friendly or easy to parse, I'm afraid.  They
rely on some crude mechanisms (e.g. I have a dedicated machine, and I
measure bandwidth by sampling ifconfig results before + after).  You can
try it by having the gtk spice client connected to an 800x600 system
with a minimal window manager, with LibreOffice maximized.  Then try
running
  make
  ./limeasure "spice display 0"
and don't blame me if it destroys your system <grin>.
>From 73bd9cb14fa559af40899077d8f1ddf9dfb065b8 Mon Sep 17 00:00:00 2001
From: Jeremy White <jwhite@xxxxxxxxxxxxxxx>
Date: Mon, 6 Aug 2012 13:58:04 -0500
Subject: [PATCH] Implement a Deferred Frames mode, in which we defer sending
 any messages to the Spice server except at a very specific
 rate, at which time we just send the modified screen.
To: wine-patches <wine-patches@xxxxxxxxxx>

---
 src/qxl.h         |    5 ++
 src/qxl_driver.c  |   40 +++++++++++--
 src/qxl_surface.c |  173 +++++++++++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 200 insertions(+), 18 deletions(-)

diff --git a/src/qxl.h b/src/qxl.h
index 1db0581..9ef0b27 100644
--- a/src/qxl.h
+++ b/src/qxl.h
@@ -123,6 +123,7 @@ enum {
     OPTION_SPICE_TLS_CIPHERS,
     OPTION_SPICE_CACERT_FILE,
     OPTION_SPICE_DH_FILE,
+    OPTION_SPICE_DEFERRED_FPS,
 #endif
     OPTION_COUNT,
 };
@@ -245,6 +246,8 @@ struct _qxl_screen_t
         uint32_t       bytes_pp;
         uint8_t        *data, *flipped;
     } guest_primary;
+
+    uint32_t           deferred_fps;
 #endif /* XSPICE */
 };
 
@@ -362,6 +365,8 @@ Bool		    qxl_surface_put_image    (qxl_surface_t *dest,
 					      const char *src, int src_pitch);
 void		    qxl_surface_unref        (surface_cache_t *cache,
 					      uint32_t surface_id);
+
+void                surface_frame_ticker     (qxl_screen_t *qxl);
 					      
 #if HAS_DEVPRIVATEKEYREC
 extern DevPrivateKeyRec uxa_pixmap_index;
diff --git a/src/qxl_driver.c b/src/qxl_driver.c
index e4c477a..accb964 100644
--- a/src/qxl_driver.c
+++ b/src/qxl_driver.c
@@ -115,6 +115,8 @@ const OptionInfoRec DefaultOptions[] = {
         "SpiceCacertFile",          OPTV_STRING,    {0}, FALSE},
     { OPTION_SPICE_DH_FILE,
         "SpiceDhFile",              OPTV_STRING,    {0}, FALSE},
+    { OPTION_SPICE_DEFERRED_FPS,
+        "DeferredFPS",             OPTV_INTEGER,   {0}, FALSE},
 #endif
 
     { -1, NULL, OPTV_NONE, {0}, FALSE }
@@ -762,7 +764,7 @@ qxl_create_screen_resources(ScreenPtr pScreen)
 	qxl_surface_kill (surf);
     
     set_surface (pPixmap, qxl->primary);
-    
+    qxl_surface_set_pixmap (qxl->primary, pPixmap);
     return TRUE;
 }
 
@@ -932,11 +934,16 @@ qxl_create_pixmap (ScreenPtr screen, int w, int h, int depth, unsigned usage)
     {
 	/* ErrorF ("   Successfully created surface in video memory\n"); */
 	
-	pixmap = fbCreatePixmap (screen, 0, 0, depth, usage);
+        if (qxl->deferred_fps > 0)
+	    pixmap = fbCreatePixmap (screen, w, h, depth, usage);
+        else
+        {
+	    pixmap = fbCreatePixmap (screen, 0, 0, depth, usage);
 
-	screen->ModifyPixmapHeader(pixmap, w, h,
-				   -1, -1, -1,
-				   NULL);
+            screen->ModifyPixmapHeader(pixmap, w, h,
+                                       -1, -1, -1,
+                                       NULL);
+        }
 	
 #if 0
 	ErrorF ("Create pixmap %p with surface %p\n", pixmap, surface);
@@ -1070,10 +1077,19 @@ setup_uxa (qxl_screen_t *qxl, ScreenPtr screen)
 
 #ifdef XSPICE
 
+SpiceCoreInterface *core;
+SpiceTimer *frames_timer;
+void frame_ticker(void *opaque);
+void frame_ticker(void *opaque)
+{
+    qxl_screen_t *qxl = (qxl_screen_t *) opaque;
+    surface_frame_ticker(qxl);
+    core->timer_start(frames_timer, 1000 / qxl->deferred_fps);
+}
+
 static void
 spiceqxl_screen_init(ScrnInfoPtr pScrn, qxl_screen_t *qxl)
 {
-    SpiceCoreInterface *core;
 
     // Init spice
     if (!qxl->spice_server) {
@@ -1084,6 +1100,11 @@ spiceqxl_screen_init(ScrnInfoPtr pScrn, qxl_screen_t *qxl)
         qxl_add_spice_display_interface(qxl);
         qxl->worker->start(qxl->worker);
         qxl->worker_running = TRUE;
+        if (qxl->deferred_fps)
+        {
+            frames_timer = core->timer_add(frame_ticker, qxl);
+            core->timer_start(frames_timer, 1000 / qxl->deferred_fps);
+        }
     }
     qxl->spice_server = qxl->spice_server;
 }
@@ -1529,6 +1550,13 @@ qxl_pre_init(ScrnInfoPtr pScrn, int flags)
 	       qxl->enable_image_cache? "Enabled" : "Disabled");
     xf86DrvMsg(scrnIndex, X_INFO, "Fallback Cache: %s\n",
 	       qxl->enable_fallback_cache? "Enabled" : "Disabled");
+#ifdef XSPICE
+    qxl->deferred_fps = get_int_option(qxl->options, OPTION_SPICE_DEFERRED_FPS, "XSPICE_DEFERRED_FPS");
+    if (qxl->deferred_fps > 0)
+        xf86DrvMsg(scrnIndex, X_INFO, "Deferred FPS: %d\n", qxl->deferred_fps);
+    else
+        xf86DrvMsg(scrnIndex, X_INFO, "Deferred Frames: Disabled\n");
+#endif
     
     if (!qxl_map_memory(qxl, scrnIndex))
 	goto out;
diff --git a/src/qxl_surface.c b/src/qxl_surface.c
index 113f09b..17501c2 100644
--- a/src/qxl_surface.c
+++ b/src/qxl_surface.c
@@ -69,6 +69,8 @@ struct qxl_surface_t
     uxa_access_t	access_type;
     RegionRec		access_region;
 
+    RegionRec		updated_region;
+
     void *		address;
     void *		end;
     
@@ -198,6 +200,8 @@ surface_cache_init (surface_cache_t *cache, qxl_screen_t *qxl)
 	
 	REGION_INIT (
 	    NULL, &(cache->all_surfaces[i].access_region), (BoxPtr)NULL, 0);
+	REGION_INIT (
+	    NULL, &(cache->all_surfaces[i].updated_region), (BoxPtr)NULL, 0);
 	cache->all_surfaces[i].access_type = UXA_ACCESS_RO;
 
 	if (i) /* surface 0 is the primary surface */
@@ -402,6 +406,7 @@ qxl_surface_cache_create_primary (surface_cache_t	*cache,
     surface->evacuated = NULL;
     
     REGION_INIT (NULL, &(surface->access_region), (BoxPtr)NULL, 0);
+    REGION_INIT (NULL, &(surface->updated_region), (BoxPtr)NULL, 0);
     surface->access_type = UXA_ACCESS_RO;
     
     return surface;
@@ -640,18 +645,6 @@ retry:
     surface->address = address;    
     surface->end = (char *)address + stride * height;
     
-    cmd = make_surface_cmd (cache, surface->id, QXL_SURFACE_CMD_CREATE);
-
-    cmd->u.surface_create.format = format;
-    cmd->u.surface_create.width = width;
-    cmd->u.surface_create.height = height;
-    cmd->u.surface_create.stride = - stride;
-
-    cmd->u.surface_create.data =
-      physical_address (qxl, surface->address, qxl->vram_mem_slot);
-
-    push_surface_cmd (cache, cmd);
-
     dev_addr = (uint32_t *)((uint8_t *)surface->address + stride * (height - 1));
 
     surface->dev_image = pixman_image_create_bits (
@@ -664,6 +657,21 @@ retry:
 
     n_live++;
     
+    if (qxl->deferred_fps > 0 && surface->id != 0)
+        return surface;
+
+    cmd = make_surface_cmd (cache, surface->id, QXL_SURFACE_CMD_CREATE);
+
+    cmd->u.surface_create.format = format;
+    cmd->u.surface_create.width = width;
+    cmd->u.surface_create.height = height;
+    cmd->u.surface_create.stride = - stride;
+
+    cmd->u.surface_create.data =
+      physical_address (qxl, surface->address, qxl->vram_mem_slot);
+
+    push_surface_cmd (cache, cmd);
+
     return surface;
 }
 
@@ -760,6 +768,12 @@ send_destroy (qxl_surface_t *surface)
 	pixman_image_unref (surface->dev_image);
     if (surface->host_image)
 	pixman_image_unref (surface->host_image);
+
+    if (surface->cache->qxl->deferred_fps > 0)
+    {
+	qxl_surface_recycle (surface->cache, surface->id);
+        return;
+    }
     
     cmd = make_surface_cmd (surface->cache, surface->id, QXL_SURFACE_CMD_DESTROY);
     
@@ -925,6 +939,18 @@ qxl_surface_prepare_access (qxl_surface_t  *surface,
     if (!pScrn->vtSema)
         return FALSE;
 
+    if (surface->cache->qxl->deferred_fps > 0)
+    {
+        fbPrepareAccess(pixmap);
+        if (access == UXA_ACCESS_RW)
+        {
+            Bool throwaway_bool;
+            RegionAppend(&surface->updated_region, region);
+            RegionValidate(&surface->updated_region, &throwaway_bool);
+        }
+        return TRUE;
+    }
+
     REGION_INIT (NULL, &new, (BoxPtr)NULL, 0);
     REGION_SUBTRACT (NULL, &new, region, &surface->access_region);
 
@@ -979,6 +1005,65 @@ translate_rect (struct QXLRect *rect)
     rect->left = rect->top = 0;
 }
 
+static void upload_one_region(qxl_surface_t *surface, int x1, int y1, int x2, int y2)
+{
+    struct QXLRect rect;
+    struct QXLDrawable *drawable;
+    struct QXLImage *image;
+    qxl_screen_t *qxl = surface->cache->qxl;
+    FbBits *data;
+    int stride;
+    int bpp;
+
+    rect.left = x1;
+    rect.right = x2;
+    rect.top = y1;
+    rect.bottom = y2;
+
+    drawable = make_drawable (qxl, surface->id, QXL_DRAW_COPY, &rect);
+    drawable->u.copy.src_area = rect;
+    translate_rect (&drawable->u.copy.src_area);
+    drawable->u.copy.rop_descriptor = ROPD_OP_PUT;
+    drawable->u.copy.scale_mode = 0;
+    drawable->u.copy.mask.flags = 0;
+    drawable->u.copy.mask.pos.x = 0;
+    drawable->u.copy.mask.pos.y = 0;
+    drawable->u.copy.mask.bitmap = 0;
+
+    fbGetPixmapBitsData(surface->pixmap, data, stride, bpp);
+    image = qxl_image_create (
+	qxl, (const uint8_t *)data, x1, y1, x2 - x1, y2 - y1, stride * sizeof(*data),
+	surface->bpp == 24 ? 4 : surface->bpp / 8, TRUE);
+    drawable->u.copy.src_bitmap =
+	physical_address (qxl, image, qxl->main_mem_slot);
+
+    push_drawable (qxl, drawable);
+}
+
+static void
+upload_updated_regions(qxl_surface_t *surface)
+{
+    int n_boxes;
+    BoxPtr boxes;
+    n_boxes = RegionNumRects(&surface->updated_region);
+    boxes = RegionRects(&surface->updated_region);
+
+    while (n_boxes--)
+    {
+        upload_one_region (surface, boxes->x1, boxes->y1, boxes->x2, boxes->y2);
+        boxes++;
+    }
+}
+
+void surface_frame_ticker(qxl_screen_t *qxl)
+{
+    upload_updated_regions(qxl->primary);
+    RegionUninit(&qxl->primary->updated_region);
+    RegionInit(&qxl->primary->updated_region, NULL, 0);
+}
+
+
+
 static void
 real_upload_box (qxl_surface_t *surface, int x1, int y1, int x2, int y2)
 {
@@ -1050,6 +1135,12 @@ qxl_surface_finish_access (qxl_surface_t *surface, PixmapPtr pixmap)
     int n_boxes;
     BoxPtr boxes;
 
+    if (surface->cache->qxl->deferred_fps > 0)
+    {
+        fbFinishAccess(pixmap);
+        return;
+    }
+
     n_boxes = REGION_NUM_RECTS (&surface->access_region);
     boxes = REGION_RECTS (&surface->access_region);
 
@@ -1231,6 +1322,27 @@ qxl_surface_solid (qxl_surface_t *destination,
     struct QXLRect qrect;
     uint32_t p;
 
+    if (qxl->deferred_fps > 0)
+    {
+        FbBits *bits;
+        int stride;
+        int bpp;
+        int xoff, yoff;
+        struct pixman_box16 box;
+        RegionPtr region;
+        Bool throwaway_bool;
+        fbGetDrawable((DrawablePtr)destination->pixmap, bits, stride, bpp, xoff, yoff);
+        pixman_fill((uint32_t *) bits, stride, bpp, x1 + xoff, y1 + yoff, x2 - x1, y2 - y1, destination->u.solid_pixel);
+        fbValidateDrawable(destination->pixmap);
+        fbFinishAccess(destination->pixmap);
+        box.x1 = x1; box.x2 = x2; box.y1 = y1; box.y2 = y2;
+        region = RegionCreate(&box, 1);
+        RegionAppend(&destination->updated_region, region);
+        RegionValidate(&destination->updated_region, &throwaway_bool);
+        RegionUninit(region);
+        return;
+    }
+
     qrect.top = y1;
     qrect.bottom = y2;
     qrect.left = x1;
@@ -1267,6 +1379,27 @@ qxl_surface_copy (qxl_surface_t *dest,
     struct QXLDrawable *drawable;
     struct QXLRect qrect;
 
+    if (qxl->deferred_fps > 0)
+    {
+        struct pixman_box16 box;
+        RegionPtr region;
+        Bool throwaway_bool;
+        GCPtr pgc;
+
+        pgc = GetScratchGC(dest->pixmap->drawable.depth, dest->pixmap->drawable.pScreen);
+        ValidateGC(&dest->pixmap->drawable, pgc);
+        fbCopyArea(&dest->u.copy_src->pixmap->drawable, &dest->pixmap->drawable, pgc, src_x1, src_y1, width, height, dest_x1, dest_y1);
+        FreeScratchGC(pgc);
+
+        box.x1 = dest_x1; box.x2 = dest_x1 + width; box.y1 = dest_y1; box.y2 = dest_y1 + height;
+        region = RegionCreate(&box, 1);
+        RegionAppend(&dest->updated_region, region);
+        RegionValidate(&dest->updated_region, &throwaway_bool);
+        RegionUninit(region);
+
+        return;
+    }
+
 #ifdef DEBUG_REGIONS
     print_region (" copy src", &(dest->u.copy_src->access_region));
     print_region (" copy dest", &(dest->access_region));
@@ -1339,6 +1472,22 @@ qxl_surface_put_image (qxl_surface_t *dest,
     struct QXLRect rect;
     struct QXLImage *image;
     
+    if (qxl->deferred_fps > 0)
+    {
+        struct pixman_box16 box;
+        RegionPtr region;
+        Bool throwaway_bool;
+        box.x1 = x; box.x2 = x + width; box.y1 = y; box.y2 = y + height;
+        region = RegionCreate(&box, 1);
+        RegionAppend(&dest->updated_region, region);
+        RegionValidate(&dest->updated_region, &throwaway_bool);
+        RegionUninit(region);
+
+        /* We can avoid doing the put image ourselves, as the uxa driver
+           will fall back and do it for us if we return false */
+        return FALSE;
+    }
+
     rect.left = x;
     rect.right = x + width;
     rect.top = y;
-- 
1.7.10.4

_______________________________________________
Spice-devel mailing list
Spice-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/spice-devel

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]     [Monitors]