Hi Martin, I had some time to run NPB. LU-HP is not available in NPB 3.3.1, so I used LU instead. Here are my results: LU Benchmark Completed. Class = C Size = 162x 162x 162 Iterations = 250 Time in seconds = XX.XX Total threads = 80 Avail threads = 80 Mop/s total = 51420.84 Mop/s/thread = 642.76 Operation type = floating point Verification = SUCCESSFUL Version = 3.3.1 Compile date = 28 Oct 2014 Compile options: F77 = gfortran FLINK = $(F77) F_LIB = (none) F_INC = (none) FFLAGS = -O3 -fopenmp -mcmodel=medium FLINKFLAGS = -O3 -fopenmp RAND = (none) With numa balance disabled : (sudo bash -c "echo 0 > /proc/sys/kernel/numa_balancing"): 1st run: Time in seconds = 39.65 2nd run: Time in seconds = 39.47 3rd run: Time in seconds = 41.31 4th run: Time in seconds = 40.42 The measurements without numa balance are stable and around 40 sec. With numa balance enabled (sudo bash -c "echo 1 > /proc/sys/kernel/numa_balancing"): 1st run: Time in seconds = 53.89 2nd run: Time in seconds = 51.95 3rd run: Time in seconds = 56.22 4th run: Time in seconds = 64.20 Enabling this option increases the runtime by more then 50 % in the worst case. Here is some information about the hardware: Kernel: Linux inwest 3.16.4-1-ARCH #1 SMP PREEMPT Mon Oct 6 08:22:27 CEST 2014 x86_64 GNU/Linux CPU: Intel(R) Xeon(R) CPU E7- 4850 @ 2.00GHz numactl --hardware available: 4 nodes (0-3) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 40 41 42 43 44 45 46 47 48 49 node 0 size: 64427 MB node 0 free: 63912 MB node 1 cpus: 10 11 12 13 14 15 16 17 18 19 50 51 52 53 54 55 56 57 58 59 node 1 size: 64509 MB node 1 free: 64066 MB node 2 cpus: 20 21 22 23 24 25 26 27 28 29 60 61 62 63 64 65 66 67 68 69 node 2 size: 64509 MB node 2 free: 63987 MB node 3 cpus: 30 31 32 33 34 35 36 37 38 39 70 71 72 73 74 75 76 77 78 79 node 3 size: 64509 MB node 3 free: 64035 MB node distances: node 0 1 2 3 0: 10 21 21 21 1: 21 10 21 21 2: 21 21 10 21 3: 21 21 21 10 Regards, Andreas 2014-10-27 22:27 GMT+01:00 Martin Ichilevici de Oliveira <iomartin@xxxxxxxxxxxx>: > Hello Andreas, > > Thank you for your reply. Please check my comments inline. > >> it would be good to know which applications/benchmarks you were running. >> >> Have you tried out some well known and open source benchmarks? >> >> NAS Parallel Benchmarks - >> http://www.nas.nasa.gov/publications/npb.html (Fortran Code) >> NPB2.3-omp-C.tgz (C version NPB in OpenMP) - >> http://www.hpcs.cs.tsukuba.ac.jp/omni-compiler/download/NPB2.3-omp-C.tgz >> Stream - http://www.cs.virginia.edu/stream/FTP/Code/stream.c > > Sorry, I should have mentioned that. I tried some NAS benchmarks: > bt, sp and lu-hp. bt and sp were around 60% slower with the balancing > turned on, and lu-hp was 10 times slower. > > I also ran Lulesh, which was roughly 100% slower with the balancing > turned on. > >> Do you have "numad" running on your machine? If it is running you >> should stop it. > > I checked and it's not running. > > Cheers, > Martin -- To unsubscribe from this list: send the line "unsubscribe linux-numa" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html