Re: [PATCH] lib/strscpy: remove word-at-a-time optimization.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2018-01-09 17:47, Andrey Ryabinin wrote:
> Attached user space program I used to see the difference.
> Usage:
> 	gcc -02 -o strscpy strscpy_test.c
> 	./strscpy {b|w} src_str_len count
> 
> src_str_len - length of source string in between 1-4096
> count - how many strscpy() to execute.
>  
> Also I've noticed something strange. I'm not sure why, but certain
> src_len values (e.g. 30) drives branch predictor crazy causing worse than usual results
> for byte-at-a-time copy:

I see something similar, but at the 30->31 transition, and the
branch-misses remain at 1-3% for higher values, until 42 where it drops
back to 0%. Anyway, I highly doubt we do a lot of string copies of
strings longer then 32.

$ perf stat ./strscpy_test b 30 10000000

 Performance counter stats for './strscpy_test b 30 10000000':

        156,777082      task-clock (msec)         #    0,999 CPUs
utilized
                 0      context-switches          #    0,000 K/sec

                 0      cpu-migrations            #    0,000 K/sec

                48      page-faults               #    0,306 K/sec

       584.646.177      cycles                    #    3,729 GHz

   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
     2.580.599.614      instructions              #    4,41  insns per
cycle
       660.114.283      branches                  # 4210,528 M/sec

             4.891      branch-misses             #    0,00% of all
branches

       0,156970910 seconds time elapsed

$ perf stat ./strscpy_test b 31 10000000

 Performance counter stats for './strscpy_test b 31 10000000':

        258,533250      task-clock (msec)         #    0,999 CPUs
utilized
                 0      context-switches          #    0,000 K/sec

                 0      cpu-migrations            #    0,000 K/sec

                50      page-faults               #    0,193 K/sec

       965.505.138      cycles                    #    3,735 GHz

   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
     2.660.773.463      instructions              #    2,76  insns per
cycle
       680.141.051      branches                  # 2630,768 M/sec

        19.150.367      branch-misses             #    2,82% of all
branches

       0,258725192 seconds time elapsed


Rasmus



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]