Cilk is total nonsense to use of course,
except if you have a NCSA supercomputer of half a billion,
and you are too lazy to make something that uses it efficiently.
I remember a comparision of Cilkchess here at my home,
where cilkchess lost a factor 40 somewhere thanks to Cilk :)
Don Dailey - i remember him sitting over here (in my house) and
playing with cilk remote versus Diep.
Then after cilkchess lost 4 games or so - getting single core 5000
positions a second, Don said: "ok now it's time to play without Cilk".
At first i didn't know what he referred to, yet he referred to using
the program at his laptop without using Cilk. It got 200k positions a
second. Factor 40 faster :)
We can mathematically prove why it is tough for applications that need low
latency, to use cilk.
The overhead is HUGE.
On Mon, 28 Apr 2014, Stefan Ruppert wrote:
Hi,
in the last few days I wanted to test the cilkplus feature of gcc 4.9. The
standard fibonacci example works fine here. But my program has a high kernel
cpu usage and is slower as the non-cilkplus (single threaded) version.
My program calculates the minimum distance route between passed cities in
germany. It builds up a complete tree where the root is the start of the
route and each leave is a possible end of the route.
A route with 10 cities needs about 2.1 seconds in the non-cilk version. The
cilk version which spawns 4 cilk tasks need about 2.5 seconds:
Non-cilk (single threaded) version:
$ time ./myroute 65830 60306 55130 Sörgenloch 25849 65439 52388 Berlin
München Hamburg
leaves: 362880
route: |-- Kriftel --[8.14497km]--> Wicker, Main-Taunus- Kreis
--[9.55325km]--> Weisenau --[12.6317km]--> Sörgenloch --[148.816km]-->
Wissersheim --[160.604km]--> Frankfurt am Main --[217.106km]--> München
--[354.114km]--> Berlin --[255.292km]--> Hamburg --[137.506km]-->
Westertilli--| total distance is 1303.77km.
real 0m2.118s
user 0m2.040s
sys 0m0.032s
Cilk-version with 4-worker threads:
$ time ./myroute -c 65830 60306 55130 Sörgenloch 25849 65439 52388 Berlin
München Hamburg
leaves: 362880
route: |-- Kriftel --[8.14497km]--> Wicker, Main-Taunus- Kreis
--[9.55325km]--> Weisenau --[12.6317km]--> Sörgenloch --[148.816km]-->
Wissersheim --[160.604km]--> Frankfurt am Main --[217.106km]--> München
--[354.114km]--> Berlin --[255.292km]--> Hamburg --[137.506km]-->
Westertilli--| total distance is 1303.77km.
real 0m2.564s
user 0m3.972s
sys 0m4.468s
Also I find out that when setting the number of workers to 2 I get a slightly
faster response time as the non-cilk version:
Cilk-version with 2-worker threads:
$ time ./myroute -c 65830 60306 55130 Sörgenloch 25849 65439 52388 Berlin
München Hamburg
leaves: 362880
route: |-- Kriftel --[8.14497km]--> Wicker, Main-Taunus- Kreis
--[9.55325km]--> Weisenau --[12.6317km]--> Sörgenloch --[148.816km]-->
Wissersheim --[160.604km]--> Frankfurt am Main --[217.106km]--> München
--[354.114km]--> Berlin --[255.292km]--> Hamburg --[137.506km]-->
Westertilli--| total distance is 1303.77km.
real 0m2.045s
user 0m2.452s
sys 0m0.988s
Any idea why the kernel cpu usage is so high?
Regards,
Stefan
PS: Here is my config:
I build gcc 4.9 from source with the following options:
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/opt/devel/build/gcc-4.9.0/libexec/gcc/x86_64-unknown-linux-gnu/4.9.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-4.9.0/configure --prefix=/opt/devel/build/gcc-4.9.0
--with-system-zlib --with-gmp=/opt/devel/build/gcc-4.9.0
--with-mpfr=/opt/devel/build/gcc-4.9.0
--with-cloog=/opt/devel/build/gcc-4.9.0 --with-mpc=/opt/devel/build/gcc-4.9.0
--with-tune=generic --enable-languages=c,c++ --enable-multilib
--with-multilib-list=m32,m64
Thread model: posix
gcc version 4.9.0 (GCC)
$ uname -a
Linux myarm 3.5.0-34-generic #55-Ubuntu SMP Thu Jun 6 20:18:19 UTC 2013
x86_64 x86_64 x86_64 GNU/Linux