Hi Hitoshi, Thanks for your reply! please see below. On 2013/6/6 21:54, Hitoshi Mitake wrote: > Hi Jianguo, > > On Wed, Jun 5, 2013 at 12:26 PM, Jianguo Wu <wujianguo@xxxxxxxxxx> wrote: >> Hi, >> One more question, I wrote a memcpy test program, mostly the same as with perf bench memcpy. >> But test result isn't consistent with perf bench when THP is off. >> >> my program perf bench >> THP: 3.628368 GB/Sec (with prefault) 3.672879 GB/Sec (with prefault) >> NO-THP: 3.612743 GB/Sec (with prefault) 6.190187 GB/Sec (with prefault) >> >> Below is my code: >> src = calloc(1, len); >> dst = calloc(1, len); >> >> if (prefault) >> memcpy(dst, src, len); >> gettimeofday(&tv_start, NULL); >> memcpy(dst, src, len); >> gettimeofday(&tv_end, NULL); >> >> timersub(&tv_end, &tv_start, &tv_diff); >> free(src); >> free(dst); >> >> speed = (double)((double)len / timeval2double(&tv_diff)); >> print_bps(speed); >> >> This is weird, is it possible that perf bench do some build optimize? >> >> Thansk, >> Jianguo Wu. > > perf bench mem memcpy is build with -O6. This is the compile command > line (you can get this with make V=1): > gcc -o bench/mem-memcpy-x86-64-asm.o -c -fno-omit-frame-pointer -ggdb3 > -funwind-tables -Wall -Wextra -std=gnu99 -Werror -O6 .... # ommited > > Can I see your compile option for your test program and the actual > command line executing perf bench mem memcpy? > I just compiled my test program with gcc -o memcpy-test memcpy-test.c. I tried to use the same compile option with perf bench mem memcpy, and the test result showed no difference. My execute command line for perf bench mem memcpy: #./perf bench mem memcpy -l 1gb -o Thanks, Jianguo Wu > Thanks, > Hitoshi > >> >> On 2013/6/4 16:57, Jianguo Wu wrote: >> >>> Hi all, >>> >>> I tested memcpy with perf bench, and found that in prefault case, When Transparent Hugepage is on, >>> memcpy has worse performance. >>> >>> When THP on is 3.672879 GB/Sec (with prefault), while THP off is 6.190187 GB/Sec (with prefault). >>> >>> I think THP will improve performance, but the test result obviously not the case. >>> Andrea mentioned THP cause "clear_page/copy_page less cache friendly" in >>> http://events.linuxfoundation.org/slides/2011/lfcs/lfcs2011_hpc_arcangeli.pdf. >>> >>> I am not quite understand this, could you please give me some comments, Thanks! >>> >>> I test in Linux-3.4-stable, and my machine info is: >>> Intel(R) Xeon(R) CPU E5520 @ 2.27GHz >>> >>> available: 2 nodes (0-1) >>> node 0 cpus: 0 1 2 3 8 9 10 11 >>> node 0 size: 24567 MB >>> node 0 free: 23550 MB >>> node 1 cpus: 4 5 6 7 12 13 14 15 >>> node 1 size: 24576 MB >>> node 1 free: 23767 MB >>> node distances: >>> node 0 1 >>> 0: 10 20 >>> 1: 20 10 >>> >>> Below is test result: >>> ---with THP--- >>> #cat /sys/kernel/mm/transparent_hugepage/enabled >>> [always] madvise never >>> #./perf bench mem memcpy -l 1gb -o >>> # Running mem/memcpy benchmark... >>> # Copying 1gb Bytes ... >>> >>> 3.672879 GB/Sec (with prefault) >>> >>> #./perf stat ... >>> Performance counter stats for './perf bench mem memcpy -l 1gb -o': >>> >>> 35455940 cache-misses # 53.504 % of all cache refs [49.45%] >>> 66267785 cache-references [49.78%] >>> 2409 page-faults >>> 450768651 dTLB-loads >>> [50.78%] >>> 24580 dTLB-misses >>> # 0.01% of all dTLB cache hits [51.01%] >>> 1338974202 dTLB-stores >>> [50.63%] >>> 77943 dTLB-misses >>> [50.24%] >>> 697404997 iTLB-loads >>> [49.77%] >>> 274 iTLB-misses >>> # 0.00% of all iTLB cache hits [49.30%] >>> >>> 0.855041819 seconds time elapsed >>> >>> ---no THP--- >>> #cat /sys/kernel/mm/transparent_hugepage/enabled >>> always madvise [never] >>> >>> #./perf bench mem memcpy -l 1gb -o >>> # Running mem/memcpy benchmark... >>> # Copying 1gb Bytes ... >>> >>> 6.190187 GB/Sec (with prefault) >>> >>> #./perf stat ... >>> Performance counter stats for './perf bench mem memcpy -l 1gb -o': >>> >>> 16920763 cache-misses # 98.377 % of all cache refs [50.01%] >>> 17200000 cache-references [50.04%] >>> 524652 page-faults >>> 734365659 dTLB-loads >>> [50.04%] >>> 4986387 dTLB-misses >>> # 0.68% of all dTLB cache hits [50.04%] >>> 1013408298 dTLB-stores >>> [50.04%] >>> 8180817 dTLB-misses >>> [49.97%] >>> 1526642351 iTLB-loads >>> [50.41%] >>> 56 iTLB-misses >>> # 0.00% of all iTLB cache hits [50.21%] >>> >>> 1.025425847 seconds time elapsed >>> >>> Thanks, >>> Jianguo Wu. >> >> >> >> > > . > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>