[spice] streaming: Use the optimal number of threads for VP8 encoding

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We run the VP8 encoder in real time mode so it uses only the minimum
amount of time needed to encode each frame. However by default it
only uses one thread so that for large/complex frames it may run at
less than the source fps. Besides resulting in dropped frames this
blocks the main server thread for most of the time.
So this patch configures the VP8 encoder to use all the CPU's physical
core, resulting in less wall clock time spent in encode_frame().

Signed-off-by: Francois Gouget <fgouget@xxxxxxxxxxxxxxx>
---

I am resubmitting this patch because I think the reasons for not
applying it last time were wrong. See:
https://lists.freedesktop.org/archives/spice-devel/2016-March/027026.html

Here is an illustration of the impact of threading for the 
big_buck_bunny_1080p_h264.mov video:

http://fgouget.free.fr/tmp/Spice-vp8-threads.png
http://fgouget.free.fr/tmp/Spice-vp8-threads.xls

The graph shows the time spent in encode_frame() (taken from the 
standard traces) when vp8enc uses 1, 2 or 4 threads.

One can see that the one-thread line spends quite a bit of time above 
the 33 ms mark which corresponds to the 30 fps of the source material. 
This means dropped frames. Indeed, the x axis corresponds to the frame 
number and we can clearly see the 1-thread line getting out of sync with 
the others as it encoded fewer frames.

The two-thread line is much lower and only goes above the 33 ms mark for 
a short time. The four-thread line is a bit lower still but we can also 
see diminishing returns there.

I'll also note that the h264 encoder automatically uses multiple 
threads already so this patch only brings vp8enc in line with it.


 configure.ac               |  4 ++++
 server/gstreamer-encoder.c | 32 ++++++++++++++++++++++++++++++--
 2 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/configure.ac b/configure.ac
index 68aed15..6742577 100644
--- a/configure.ac
+++ b/configure.ac
@@ -150,6 +150,10 @@ AC_SUBST([SPICE_PROTOCOL_MIN_VER])
 PKG_CHECK_MODULES([GLIB2], [glib-2.0 >= 2.22 gio-2.0 >= 2.22])
 AS_VAR_APPEND([SPICE_REQUIRES], [" glib-2.0 >= 2.22 gio-2.0 >= 2.22"])
 
+AC_CHECK_LIB(glib-2.0, g_get_num_processors,
+             AC_DEFINE([HAVE_G_GET_NUMPROCESSORS], 1, [Defined if we have g_get_num_processors()]),,
+             $GLIB2_LIBS)
+
 PKG_CHECK_MODULES([GOBJECT2], [gobject-2.0 >= 2.22])
 AS_VAR_APPEND([SPICE_REQUIRES], [" gobject-2.0 >= 2.22"])
 
diff --git a/server/gstreamer-encoder.c b/server/gstreamer-encoder.c
index a101ab6..eb2a28c 100644
--- a/server/gstreamer-encoder.c
+++ b/server/gstreamer-encoder.c
@@ -866,6 +866,27 @@ static GstFlowReturn new_sample(GstAppSink *gstappsink, gpointer video_encoder)
     return GST_FLOW_OK;
 }
 
+static int physical_core_count = 0;
+static int get_physical_core_count(void)
+{
+    if (!physical_core_count) {
+#ifdef HAVE_G_GET_NUMPROCESSORS
+        physical_core_count = g_get_num_processors();
+#elif defined(_SC_NPROCESSORS_ONLN)
+        physical_core_count = sysconf(_SC_NPROCESSORS_ONLN);
+#endif
+        if (system("egrep -l '^flags\\b.*: .*\\bht\\b' /proc/cpuinfo >/dev/null 2>&1") == 0) {
+            /* Hyperthreading is enabled so divide by two to get the number
+             * of physical cores.
+             */
+            physical_core_count = physical_core_count / 2;
+        }
+        if (physical_core_count == 0)
+            physical_core_count = 1;
+    }
+    return physical_core_count;
+}
+
 static const gchar* get_gst_codec_name(SpiceGstEncoder *encoder)
 {
     switch (encoder->base.codec_type)
@@ -887,6 +908,7 @@ static const gchar* get_gst_codec_name(SpiceGstEncoder *encoder)
     }
 }
 
+/* A helper for spice_gst_encoder_encode_frame() */
 static gboolean create_pipeline(SpiceGstEncoder *encoder)
 {
 #ifdef HAVE_GSTREAMER_0_10
@@ -925,11 +947,17 @@ static gboolean create_pipeline(SpiceGstEncoder *encoder)
          *   75% CPU usage while speed simply prioritizes encoding speed.
          * - deadline is supposed to be set in microseconds but in practice
          *   it behaves like a boolean.
+         * - At least up to GStreamer 1.6.2, vp8enc cannot be trusted to pick
+         *   the optimal number of threads. Also exceeding the number of
+         *   physical core really degrades image quality.
+         * - token-parts/token-partitions parallelizes more operations.
          */
+        int threads = get_physical_core_count();
+        int parts = threads < 2 ? 0 : threads < 4 ? 1 : threads < 8 ? 2 : 3;
 #ifdef HAVE_GSTREAMER_0_10
-        gstenc_opts = g_strdup_printf("mode=cbr min-quantizer=10 error-resilient=true max-latency=0 speed=7");
+        gstenc_opts = g_strdup_printf("mode=cbr min-quantizer=10 error-resilient=true max-latency=0 speed=7 threads=%d token-parts=%d", threads, parts);
 #else
-        gstenc_opts = g_strdup_printf("end-usage=cbr min-quantizer=10 error-resilient=default lag-in-frames=0 deadline=1 cpu-used=4");
+        gstenc_opts = g_strdup_printf("end-usage=cbr min-quantizer=10 error-resilient=default lag-in-frames=0 deadline=1 cpu-used=4 threads=%d token-partitions=%d", threads, parts);
 #endif
         break;
         }
-- 
2.10.1
_______________________________________________
Spice-devel mailing list
Spice-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/spice-devel




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]     [Monitors]