On 2018-01-09 17:47, Andrey Ryabinin wrote: > Attached user space program I used to see the difference. > Usage: > gcc -02 -o strscpy strscpy_test.c > ./strscpy {b|w} src_str_len count > > src_str_len - length of source string in between 1-4096 > count - how many strscpy() to execute. > > Also I've noticed something strange. I'm not sure why, but certain > src_len values (e.g. 30) drives branch predictor crazy causing worse than usual results > for byte-at-a-time copy: I see something similar, but at the 30->31 transition, and the branch-misses remain at 1-3% for higher values, until 42 where it drops back to 0%. Anyway, I highly doubt we do a lot of string copies of strings longer then 32. $ perf stat ./strscpy_test b 30 10000000 Performance counter stats for './strscpy_test b 30 10000000': 156,777082 task-clock (msec) # 0,999 CPUs utilized 0 context-switches # 0,000 K/sec 0 cpu-migrations # 0,000 K/sec 48 page-faults # 0,306 K/sec 584.646.177 cycles # 3,729 GHz <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 2.580.599.614 instructions # 4,41 insns per cycle 660.114.283 branches # 4210,528 M/sec 4.891 branch-misses # 0,00% of all branches 0,156970910 seconds time elapsed $ perf stat ./strscpy_test b 31 10000000 Performance counter stats for './strscpy_test b 31 10000000': 258,533250 task-clock (msec) # 0,999 CPUs utilized 0 context-switches # 0,000 K/sec 0 cpu-migrations # 0,000 K/sec 50 page-faults # 0,193 K/sec 965.505.138 cycles # 3,735 GHz <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 2.660.773.463 instructions # 2,76 insns per cycle 680.141.051 branches # 2630,768 M/sec 19.150.367 branch-misses # 2,82% of all branches 0,258725192 seconds time elapsed Rasmus