Re: Transparent Hugepage impact on memcpy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jianguo,

On Wed, Jun 5, 2013 at 12:26 PM, Jianguo Wu <wujianguo@xxxxxxxxxx> wrote:
> Hi,
> One more question, I wrote a memcpy test program, mostly the same as with perf bench memcpy.
> But test result isn't consistent with perf bench when THP is off.
>
>         my program                              perf bench
> THP:    3.628368 GB/Sec (with prefault)         3.672879 GB/Sec (with prefault)
> NO-THP: 3.612743 GB/Sec (with prefault)         6.190187 GB/Sec (with prefault)
>
> Below is my code:
>         src = calloc(1, len);
>         dst = calloc(1, len);
>
>         if (prefault)
>                 memcpy(dst, src, len);
>         gettimeofday(&tv_start, NULL);
>         memcpy(dst, src, len);
>         gettimeofday(&tv_end, NULL);
>
>         timersub(&tv_end, &tv_start, &tv_diff);
>         free(src);
>         free(dst);
>
>         speed = (double)((double)len / timeval2double(&tv_diff));
>         print_bps(speed);
>
> This is weird, is it possible that perf bench do some build optimize?
>
> Thansk,
> Jianguo Wu.

perf bench mem memcpy is build with -O6. This is the compile command
line (you can get this with make V=1):
gcc -o bench/mem-memcpy-x86-64-asm.o -c -fno-omit-frame-pointer -ggdb3
-funwind-tables -Wall -Wextra -std=gnu99 -Werror -O6 .... # ommited

Can I see your compile option for your test program and the actual
command line executing perf bench mem memcpy?

Thanks,
Hitoshi

>
> On 2013/6/4 16:57, Jianguo Wu wrote:
>
>> Hi all,
>>
>> I tested memcpy with perf bench, and found that in prefault case, When Transparent Hugepage is on,
>> memcpy has worse performance.
>>
>> When THP on is 3.672879 GB/Sec (with prefault), while THP off is 6.190187 GB/Sec (with prefault).
>>
>> I think THP will improve performance, but the test result obviously not the case.
>> Andrea mentioned THP cause "clear_page/copy_page less cache friendly" in
>> http://events.linuxfoundation.org/slides/2011/lfcs/lfcs2011_hpc_arcangeli.pdf.
>>
>> I am not quite understand this, could you please give me some comments, Thanks!
>>
>> I test in Linux-3.4-stable, and my machine info is:
>> Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz
>>
>> available: 2 nodes (0-1)
>> node 0 cpus: 0 1 2 3 8 9 10 11
>> node 0 size: 24567 MB
>> node 0 free: 23550 MB
>> node 1 cpus: 4 5 6 7 12 13 14 15
>> node 1 size: 24576 MB
>> node 1 free: 23767 MB
>> node distances:
>> node   0   1
>>   0:  10  20
>>   1:  20  10
>>
>> Below is test result:
>> ---with THP---
>> #cat /sys/kernel/mm/transparent_hugepage/enabled
>> [always] madvise never
>> #./perf bench mem memcpy -l 1gb -o
>> # Running mem/memcpy benchmark...
>> # Copying 1gb Bytes ...
>>
>>        3.672879 GB/Sec (with prefault)
>>
>> #./perf stat ...
>> Performance counter stats for './perf bench mem memcpy -l 1gb -o':
>>
>>           35455940 cache-misses              #   53.504 % of all cache refs     [49.45%]
>>           66267785 cache-references                                             [49.78%]
>>               2409 page-faults
>>          450768651 dTLB-loads
>>                                                   [50.78%]
>>              24580 dTLB-misses
>>               #    0.01% of all dTLB cache hits  [51.01%]
>>         1338974202 dTLB-stores
>>                                                  [50.63%]
>>              77943 dTLB-misses
>>                                                  [50.24%]
>>          697404997 iTLB-loads
>>                                                   [49.77%]
>>                274 iTLB-misses
>>               #    0.00% of all iTLB cache hits  [49.30%]
>>
>>        0.855041819 seconds time elapsed
>>
>> ---no THP---
>> #cat /sys/kernel/mm/transparent_hugepage/enabled
>> always madvise [never]
>>
>> #./perf bench mem memcpy -l 1gb -o
>> # Running mem/memcpy benchmark...
>> # Copying 1gb Bytes ...
>>
>>        6.190187 GB/Sec (with prefault)
>>
>> #./perf stat ...
>> Performance counter stats for './perf bench mem memcpy -l 1gb -o':
>>
>>           16920763 cache-misses              #   98.377 % of all cache refs     [50.01%]
>>           17200000 cache-references                                             [50.04%]
>>             524652 page-faults
>>          734365659 dTLB-loads
>>                                                   [50.04%]
>>            4986387 dTLB-misses
>>               #    0.68% of all dTLB cache hits  [50.04%]
>>         1013408298 dTLB-stores
>>                                                  [50.04%]
>>            8180817 dTLB-misses
>>                                                  [49.97%]
>>         1526642351 iTLB-loads
>>                                                   [50.41%]
>>                 56 iTLB-misses
>>               #    0.00% of all iTLB cache hits  [50.21%]
>>
>>        1.025425847 seconds time elapsed
>>
>> Thanks,
>> Jianguo Wu.
>
>
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]