Hello, I'm experimenting with the kernel's automatic NUMA balancing and I'm experiencing performance loss when I turn the balancing on. Through the sysctl kernel.numa_balancing, I've ran some benchmarks and the performance with the balancing turned *off* was consistently better. A general figure was 50% slower with balancing on, but in one particular case it got 10 times slower. AFAIK, none of the benchmarks are NUMA-aware, so I was expecting some kind of performance gain. In the case any are in fact NUMA-aware, I could understand a slight performance loss, but just slight: since the scan period adapts to the ratio of local/remote faults, it should quickly understand that memory placement is already optimal and increase the scan period so that the overhead is small. I would appreciate if someone could help me find what's causing this behavior. I'm running on CentOS 6.5 with Linux 3.17.1 (self compiled). Below is some stuff that might be useful. If you need anything else, just tell me. Thank you, Martin $ for x in /proc/sys/kernel/numa_balancing* ; do echo $x ; cat $x; done /proc/sys/kernel/numa_balancing 32768 /proc/sys/kernel/numa_balancing_scan_delay_ms 1000 /proc/sys/kernel/numa_balancing_scan_period_max_ms 60000 /proc/sys/kernel/numa_balancing_scan_period_min_ms 1000 /proc/sys/kernel/numa_balancing_scan_size_mb 256 (I tried playing with differents values of scan size, but it didn't made much of a difference) $ numactl --hardware available: 4 nodes (0,2,4,6) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 node 0 size: 16076 MB node 0 free: 15713 MB node 2 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 node 2 size: 16157 MB node 2 free: 15903 MB node 4 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 node 4 size: 16157 MB node 4 free: 15998 MB node 6 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 node 6 size: 16141 MB node 6 free: 15941 MB node distances: node 0 2 4 6 0: 10 16 16 16 2: 16 10 16 16 4: 16 16 10 16 6: 16 16 16 10 $ grep NUMA .config CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y CONFIG_ARCH_WANTS_PROT_NUMA_PROT_NONE=y CONFIG_ARCH_USES_NUMA_PROT_NONE=y CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y CONFIG_NUMA_BALANCING=y # CONFIG_X86_NUMACHIP is not set CONFIG_NUMA=y CONFIG_AMD_NUMA=y CONFIG_X86_64_ACPI_NUMA=y # CONFIG_NUMA_EMU is not set CONFIG_USE_PERCPU_NUMA_NODE_ID=y CONFIG_ACPI_NUMA=y $ cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 21 model : 1 model name : AMD Opteron(TM) Processor 6272 stepping : 2 microcode : 0x6000629 cpu MHz : 1400.000 cache size : 2048 KB physical id : 0 siblings : 16 core id : 0 cpu cores : 8 apicid : 32 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx lahf_lm cmp_lega cy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 nodeid_msr topoext perfctr_core perfctr_nb arat cpb hw_pstate npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold bugs : fxsave_leak bogomips : 4200.06 TLB size : 1536 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm 100mhzsteps hwpstate cpb (63 other just like this) $ cat /proc/meminfo MemTotal: 66083340 kB MemFree: 64445100 kB MemAvailable: 64477352 kB Buffers: 172848 kB Cached: 132988 kB SwapCached: 0 kB Active: 857468 kB Inactive: 70176 kB Active(anon): 621996 kB Inactive(anon): 8 kB Active(file): 235472 kB Inactive(file): 70168 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 65535996 kB SwapFree: 65535996 kB Dirty: 28 kB Writeback: 0 kB AnonPages: 692268 kB Mapped: 21468 kB Shmem: 200 kB Slab: 182152 kB SReclaimable: 96752 kB SUnreclaim: 85400 kB KernelStack: 13120 kB PageTables: 5928 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 98577664 kB Committed_AS: 1363304 kB VmallocTotal: 34359738367 kB VmallocUsed: 400824 kB VmallocChunk: 34309010120 kB HardwareCorrupted: 0 kB AnonHugePages: 665600 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 160128 kB DirectMap2M: 5064704 kB DirectMap1G: 61865984 kB
Attachment:
pgpw0soDchHFC.pgp
Description: PGP signature