On Tue, Jun 04, 2013 at 04:57:57PM +0800, Jianguo Wu wrote: >Hi all, > >I tested memcpy with perf bench, and found that in prefault case, When Transparent Hugepage is on, >memcpy has worse performance. > >When THP on is 3.672879 GB/Sec (with prefault), while THP off is 6.190187 GB/Sec (with prefault). > I get similar result as you against 3.10-rc4 in the attachment. This dues to the characteristic of thp takes a single page fault for each 2MB virtual region touched by userland. >I think THP will improve performance, but the test result obviously not the case. >Andrea mentioned THP cause "clear_page/copy_page less cache friendly" in >http://events.linuxfoundation.org/slides/2011/lfcs/lfcs2011_hpc_arcangeli.pdf. > >I am not quite understand this, could you please give me some comments, Thanks! > >I test in Linux-3.4-stable, and my machine info is: >Intel(R) Xeon(R) CPU E5520 @ 2.27GHz > >available: 2 nodes (0-1) >node 0 cpus: 0 1 2 3 8 9 10 11 >node 0 size: 24567 MB >node 0 free: 23550 MB >node 1 cpus: 4 5 6 7 12 13 14 15 >node 1 size: 24576 MB >node 1 free: 23767 MB >node distances: >node 0 1 > 0: 10 20 > 1: 20 10 > >Below is test result: >---with THP--- >#cat /sys/kernel/mm/transparent_hugepage/enabled >[always] madvise never >#./perf bench mem memcpy -l 1gb -o ># Running mem/memcpy benchmark... ># Copying 1gb Bytes ... > > 3.672879 GB/Sec (with prefault) > >#./perf stat ... >Performance counter stats for './perf bench mem memcpy -l 1gb -o': > > 35455940 cache-misses # 53.504 % of all cache refs [49.45%] > 66267785 cache-references [49.78%] > 2409 page-faults > 450768651 dTLB-loads > [50.78%] > 24580 dTLB-misses > # 0.01% of all dTLB cache hits [51.01%] > 1338974202 dTLB-stores > [50.63%] > 77943 dTLB-misses > [50.24%] > 697404997 iTLB-loads > [49.77%] > 274 iTLB-misses > # 0.00% of all iTLB cache hits [49.30%] > > 0.855041819 seconds time elapsed > >---no THP--- >#cat /sys/kernel/mm/transparent_hugepage/enabled >always madvise [never] > >#./perf bench mem memcpy -l 1gb -o ># Running mem/memcpy benchmark... ># Copying 1gb Bytes ... > > 6.190187 GB/Sec (with prefault) > >#./perf stat ... >Performance counter stats for './perf bench mem memcpy -l 1gb -o': > > 16920763 cache-misses # 98.377 % of all cache refs [50.01%] > 17200000 cache-references [50.04%] > 524652 page-faults > 734365659 dTLB-loads > [50.04%] > 4986387 dTLB-misses > # 0.68% of all dTLB cache hits [50.04%] > 1013408298 dTLB-stores > [50.04%] > 8180817 dTLB-misses > [49.97%] > 1526642351 iTLB-loads > [50.41%] > 56 iTLB-misses > # 0.00% of all iTLB cache hits [50.21%] > > 1.025425847 seconds time elapsed > >Thanks, >Jianguo Wu. > >-- >To unsubscribe, send a message with 'unsubscribe linux-mm' in >the body to majordomo@xxxxxxxxx. For more info on Linux MM, >see: http://www.linux-mm.org/ . >Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>
---with THP--- #cat /sys/kernel/mm/transparent_hugepage/enabled [always] madvise never # Running mem/memcpy benchmark... # Copying 1gb Bytes ... 12.208522 GB/Sec (with prefault) Performance counter stats for './perf bench mem memcpy -l 1gb -o': 26,453,696 cache-misses # 35.411 % of all cache refs [57.66%] 74,704,531 cache-references [58.40%] 2,297 page-faults 146,567,960 dTLB-loads [58.64%] 211,648,685 dTLB-stores [58.63%] 14,533 dTLB-load-misses # 0.01% of all dTLB cache hits [57.46%] 640 iTLB-loads [55.74%] 270,881 iTLB-load-misses # 42325.16% of all iTLB cache hits [55.17%] 0.232425109 seconds time elapsed ---no THP--- #cat /sys/kernel/mm/transparent_hugepage/enabled always madvise [never] # Running mem/memcpy benchmark... # Copying 1gb Bytes ... 18.325087 GB/Sec (with prefault) Performance counter stats for './perf bench mem memcpy -l 1gb -o': 28,498,544 cache-misses # 86.167 % of all cache refs [57.35%] 33,073,611 cache-references [57.71%] 524,540 page-faults 453,500,641 dTLB-loads [57.99%] 409,255,606 dTLB-stores [57.99%] 2,033,985 dTLB-load-misses # 0.45% of all dTLB cache hits [57.52%] 1,180 iTLB-loads [56.69%] 539,056 iTLB-load-misses # 45682.71% of all iTLB cache hits [56.02%] 0.485932214 seconds time elapsed