Hi,
in the last few days I wanted to test the cilkplus feature of gcc 4.9.
The standard fibonacci example works fine here. But my program has a
high kernel cpu usage and is slower as the non-cilkplus (single
threaded) version.
My program calculates the minimum distance route between passed cities
in germany. It builds up a complete tree where the root is the start of
the route and each leave is a possible end of the route.
A route with 10 cities needs about 2.1 seconds in the non-cilk version.
The cilk version which spawns 4 cilk tasks need about 2.5 seconds:
Non-cilk (single threaded) version:
$ time ./myroute 65830 60306 55130 Sörgenloch 25849 65439 52388 Berlin
München Hamburg
leaves: 362880
route: |-- Kriftel --[8.14497km]--> Wicker, Main-Taunus- Kreis
--[9.55325km]--> Weisenau --[12.6317km]--> Sörgenloch --[148.816km]-->
Wissersheim --[160.604km]--> Frankfurt am Main --[217.106km]--> München
--[354.114km]--> Berlin --[255.292km]--> Hamburg --[137.506km]-->
Westertilli--| total distance is 1303.77km.
real 0m2.118s
user 0m2.040s
sys 0m0.032s
Cilk-version with 4-worker threads:
$ time ./myroute -c 65830 60306 55130 Sörgenloch 25849 65439 52388
Berlin München Hamburg
leaves: 362880
route: |-- Kriftel --[8.14497km]--> Wicker, Main-Taunus- Kreis
--[9.55325km]--> Weisenau --[12.6317km]--> Sörgenloch --[148.816km]-->
Wissersheim --[160.604km]--> Frankfurt am Main --[217.106km]--> München
--[354.114km]--> Berlin --[255.292km]--> Hamburg --[137.506km]-->
Westertilli--| total distance is 1303.77km.
real 0m2.564s
user 0m3.972s
sys 0m4.468s
Also I find out that when setting the number of workers to 2 I get a
slightly faster response time as the non-cilk version:
Cilk-version with 2-worker threads:
$ time ./myroute -c 65830 60306 55130 Sörgenloch 25849 65439 52388
Berlin München Hamburg
leaves: 362880
route: |-- Kriftel --[8.14497km]--> Wicker, Main-Taunus- Kreis
--[9.55325km]--> Weisenau --[12.6317km]--> Sörgenloch --[148.816km]-->
Wissersheim --[160.604km]--> Frankfurt am Main --[217.106km]--> München
--[354.114km]--> Berlin --[255.292km]--> Hamburg --[137.506km]-->
Westertilli--| total distance is 1303.77km.
real 0m2.045s
user 0m2.452s
sys 0m0.988s
Any idea why the kernel cpu usage is so high?
Regards,
Stefan
PS: Here is my config:
I build gcc 4.9 from source with the following options:
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/opt/devel/build/gcc-4.9.0/libexec/gcc/x86_64-unknown-linux-gnu/4.9.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-4.9.0/configure
--prefix=/opt/devel/build/gcc-4.9.0 --with-system-zlib
--with-gmp=/opt/devel/build/gcc-4.9.0
--with-mpfr=/opt/devel/build/gcc-4.9.0
--with-cloog=/opt/devel/build/gcc-4.9.0
--with-mpc=/opt/devel/build/gcc-4.9.0 --with-tune=generic
--enable-languages=c,c++ --enable-multilib --with-multilib-list=m32,m64
Thread model: posix
gcc version 4.9.0 (GCC)
$ uname -a
Linux myarm 3.5.0-34-generic #55-Ubuntu SMP Thu Jun 6 20:18:19 UTC 2013
x86_64 x86_64 x86_64 GNU/Linux