Hi, One more question, I wrote a memcpy test program, mostly the same as with perf bench memcpy. But test result isn't consistent with perf bench when THP is off. my program perf bench THP: 3.628368 GB/Sec (with prefault) 3.672879 GB/Sec (with prefault) NO-THP: 3.612743 GB/Sec (with prefault) 6.190187 GB/Sec (with prefault) Below is my code: src = calloc(1, len); dst = calloc(1, len); if (prefault) memcpy(dst, src, len); gettimeofday(&tv_start, NULL); memcpy(dst, src, len); gettimeofday(&tv_end, NULL); timersub(&tv_end, &tv_start, &tv_diff); free(src); free(dst); speed = (double)((double)len / timeval2double(&tv_diff)); print_bps(speed); This is weird, is it possible that perf bench do some build optimize? Thansk, Jianguo Wu. On 2013/6/4 16:57, Jianguo Wu wrote: > Hi all, > > I tested memcpy with perf bench, and found that in prefault case, When Transparent Hugepage is on, > memcpy has worse performance. > > When THP on is 3.672879 GB/Sec (with prefault), while THP off is 6.190187 GB/Sec (with prefault). > > I think THP will improve performance, but the test result obviously not the case. > Andrea mentioned THP cause "clear_page/copy_page less cache friendly" in > http://events.linuxfoundation.org/slides/2011/lfcs/lfcs2011_hpc_arcangeli.pdf. > > I am not quite understand this, could you please give me some comments, Thanks! > > I test in Linux-3.4-stable, and my machine info is: > Intel(R) Xeon(R) CPU E5520 @ 2.27GHz > > available: 2 nodes (0-1) > node 0 cpus: 0 1 2 3 8 9 10 11 > node 0 size: 24567 MB > node 0 free: 23550 MB > node 1 cpus: 4 5 6 7 12 13 14 15 > node 1 size: 24576 MB > node 1 free: 23767 MB > node distances: > node 0 1 > 0: 10 20 > 1: 20 10 > > Below is test result: > ---with THP--- > #cat /sys/kernel/mm/transparent_hugepage/enabled > [always] madvise never > #./perf bench mem memcpy -l 1gb -o > # Running mem/memcpy benchmark... > # Copying 1gb Bytes ... > > 3.672879 GB/Sec (with prefault) > > #./perf stat ... > Performance counter stats for './perf bench mem memcpy -l 1gb -o': > > 35455940 cache-misses # 53.504 % of all cache refs [49.45%] > 66267785 cache-references [49.78%] > 2409 page-faults > 450768651 dTLB-loads > [50.78%] > 24580 dTLB-misses > # 0.01% of all dTLB cache hits [51.01%] > 1338974202 dTLB-stores > [50.63%] > 77943 dTLB-misses > [50.24%] > 697404997 iTLB-loads > [49.77%] > 274 iTLB-misses > # 0.00% of all iTLB cache hits [49.30%] > > 0.855041819 seconds time elapsed > > ---no THP--- > #cat /sys/kernel/mm/transparent_hugepage/enabled > always madvise [never] > > #./perf bench mem memcpy -l 1gb -o > # Running mem/memcpy benchmark... > # Copying 1gb Bytes ... > > 6.190187 GB/Sec (with prefault) > > #./perf stat ... > Performance counter stats for './perf bench mem memcpy -l 1gb -o': > > 16920763 cache-misses # 98.377 % of all cache refs [50.01%] > 17200000 cache-references [50.04%] > 524652 page-faults > 734365659 dTLB-loads > [50.04%] > 4986387 dTLB-misses > # 0.68% of all dTLB cache hits [50.04%] > 1013408298 dTLB-stores > [50.04%] > 8180817 dTLB-misses > [49.97%] > 1526642351 iTLB-loads > [50.41%] > 56 iTLB-misses > # 0.00% of all iTLB cache hits [50.21%] > > 1.025425847 seconds time elapsed > > Thanks, > Jianguo Wu. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>