Re: Transparent Hugepage impact on memcpy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 04, 2013 at 04:57:57PM +0800, Jianguo Wu wrote:
>Hi all,
>
>I tested memcpy with perf bench, and found that in prefault case, When Transparent Hugepage is on,
>memcpy has worse performance.
>
>When THP on is 3.672879 GB/Sec (with prefault), while THP off is 6.190187 GB/Sec (with prefault).
>

I get similar result as you against 3.10-rc4 in the attachment. This
dues to the characteristic of thp takes a single page fault for each 
2MB virtual region touched by userland.

>I think THP will improve performance, but the test result obviously not the case. 
>Andrea mentioned THP cause "clear_page/copy_page less cache friendly" in
>http://events.linuxfoundation.org/slides/2011/lfcs/lfcs2011_hpc_arcangeli.pdf.
>
>I am not quite understand this, could you please give me some comments, Thanks!
>
>I test in Linux-3.4-stable, and my machine info is:
>Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz
>
>available: 2 nodes (0-1)
>node 0 cpus: 0 1 2 3 8 9 10 11
>node 0 size: 24567 MB
>node 0 free: 23550 MB
>node 1 cpus: 4 5 6 7 12 13 14 15
>node 1 size: 24576 MB
>node 1 free: 23767 MB
>node distances:
>node   0   1 
>  0:  10  20 
>  1:  20  10
>
>Below is test result:
>---with THP---
>#cat /sys/kernel/mm/transparent_hugepage/enabled
>[always] madvise never
>#./perf bench mem memcpy -l 1gb -o
># Running mem/memcpy benchmark...
># Copying 1gb Bytes ...
>
>       3.672879 GB/Sec (with prefault)
>
>#./perf stat ...
>Performance counter stats for './perf bench mem memcpy -l 1gb -o':
>
>          35455940 cache-misses              #   53.504 % of all cache refs     [49.45%]
>          66267785 cache-references                                             [49.78%]
>              2409 page-faults                                                 
>         450768651 dTLB-loads
>                                                  [50.78%]
>             24580 dTLB-misses
>              #    0.01% of all dTLB cache hits  [51.01%]
>        1338974202 dTLB-stores
>                                                 [50.63%]
>             77943 dTLB-misses
>                                                 [50.24%]
>         697404997 iTLB-loads
>                                                  [49.77%]
>               274 iTLB-misses
>              #    0.00% of all iTLB cache hits  [49.30%]
>
>       0.855041819 seconds time elapsed
>
>---no THP---
>#cat /sys/kernel/mm/transparent_hugepage/enabled
>always madvise [never]
>
>#./perf bench mem memcpy -l 1gb -o
># Running mem/memcpy benchmark...
># Copying 1gb Bytes ...
>
>       6.190187 GB/Sec (with prefault)
>
>#./perf stat ...
>Performance counter stats for './perf bench mem memcpy -l 1gb -o':
>
>          16920763 cache-misses              #   98.377 % of all cache refs     [50.01%]
>          17200000 cache-references                                             [50.04%]
>            524652 page-faults                                                 
>         734365659 dTLB-loads
>                                                  [50.04%]
>           4986387 dTLB-misses
>              #    0.68% of all dTLB cache hits  [50.04%]
>        1013408298 dTLB-stores
>                                                 [50.04%]
>           8180817 dTLB-misses
>                                                 [49.97%]
>        1526642351 iTLB-loads
>                                                  [50.41%]
>                56 iTLB-misses
>              #    0.00% of all iTLB cache hits  [50.21%]
>
>       1.025425847 seconds time elapsed
>
>Thanks,
>Jianguo Wu.
>
>--
>To unsubscribe, send a message with 'unsubscribe linux-mm' in
>the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
>see: http://www.linux-mm.org/ .
>Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>
---with THP---
#cat  /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never
# Running mem/memcpy benchmark...
# Copying 1gb Bytes ...

      12.208522 GB/Sec (with prefault)

 Performance counter stats for './perf bench mem memcpy -l 1gb -o':

        26,453,696 cache-misses              #   35.411 % of all cache refs     [57.66%]
        74,704,531 cache-references                                             [58.40%]
             2,297 page-faults                                                 
       146,567,960 dTLB-loads                                                   [58.64%]
       211,648,685 dTLB-stores                                                  [58.63%]
            14,533 dTLB-load-misses          #    0.01% of all dTLB cache hits  [57.46%]
               640 iTLB-loads                                                   [55.74%]
           270,881 iTLB-load-misses          #  42325.16% of all iTLB cache hits  [55.17%]

       0.232425109 seconds time elapsed

---no THP---
#cat  /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]

# Running mem/memcpy benchmark...
# Copying 1gb Bytes ...

      18.325087 GB/Sec (with prefault)

 Performance counter stats for './perf bench mem memcpy -l 1gb -o':

        28,498,544 cache-misses              #   86.167 % of all cache refs     [57.35%]
        33,073,611 cache-references                                             [57.71%]
           524,540 page-faults                                                 
       453,500,641 dTLB-loads                                                   [57.99%]
       409,255,606 dTLB-stores                                                  [57.99%]
         2,033,985 dTLB-load-misses          #    0.45% of all dTLB cache hits  [57.52%]
             1,180 iTLB-loads                                                   [56.69%]
           539,056 iTLB-load-misses          #  45682.71% of all iTLB cache hits  [56.02%]

       0.485932214 seconds time elapsed

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]