Re: [PATCH v2] mm: Optimized hugepage zeroing & copying from user

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Prathu Baronia <prathu.baronia@xxxxxxxxxxx> writes:

> The 04/15/2020 11:27, Huang, Ying wrote:
>> 
>> Can you describe your test?
>> 
> We profile the clear_huge_page() using ftrace while parallely force triggering it by a simple
> userspace test code which allocates 100MB of anon memory and traverses through
> it in loop.
>> 
>> You have tested the chunk sizes 4KB and 2MB, can you test some values in
>> between?  For example 32KB or 64KB?  Maybe there's a sweet point with
>> some smaller granularity and good performance.
> Based on your advise I tried chunk sizes of 4KB, 8KB, 16KB, 32KB and 64KB on
> arm64 and x86_64 by copying the kernel memset implementation for both the archs.
> -------------------------------------------------------------------------------
> Results(the sample size is 100 for each and the values are in us):-
> -------------------------------------------------------------------------------
> ARM64(CPU0 & 6 on and set at max frequency, DDR set to performance governor):-
> -------------------------------------------------------------------------------
> Chunk Size = 4KB
> -----------------
> Oneshot
> 	Mean : 3402.06
> 	Stddev : 72.6576
> Forward
> 	Mean : 3408.04
> 	Stddev : 72.976
> Reverse
> 	Mean : 17699.3
> 	Stddev : 132.875
> -----------------
> Chunk Size = 8KB
> -----------------
> Oneshot
> 	Mean : 3398.64
> 	Stddev : 80.6334
> Forward
> 	Mean : 3391.58
> 	Stddev : 65.9063
> Reverse
> 	Mean : 13909.2
> 	Stddev : 194.324
> -----------------
> Chunk Size = 16KB
> -----------------
> Oneshot
> 	Mean : 3393.57
> 	Stddev : 72.2485
> Forward
> 	Mean : 3404.69
> 	Stddev : 84.4705
> Reverse
> 	Mean : 9278.65
> 	Stddev : 217.725
> -----------------
> Chunk Size = 32KB
> -----------------
> Oneshot
> 	Mean : 3425.7
> 	Stddev : 129.156
> Forward
> 	Mean : 3402.07
> 	Stddev : 82.6713
> Reverse
> 	Mean : 6831.43
> 	Stddev : 184.807
> -----------------
> Chunk Size = 64KB
> -----------------
> Oneshot
> 	Mean : 3398.72
> 	Stddev : 77.9703
> Forward
> 	Mean : 3413.52
> 	Stddev : 173.121
> Reverse
> 	Mean : 5542.84
> 	Stddev : 197.017

Maybe a little larger chunk size is good enough for ARM64?

> ---------------------------------------------
> x86_64(Only CPU0 on and set to max frequency)
> ---------------------------------------------
> Chunk Size = 4KB
> -----------------
> Oneshot
> 	Mean : 6752.59
> 	Stddev : 298.988
> Forward
> 	Mean : 6873.6
> 	Stddev : 325.607
> Reverse
> 	Mean : 6722.88
> 	Stddev : 365.837
> -----------------
> Chunk Size = 8KB
> -----------------
> Oneshot
> 	Mean : 6848.57
> 	Stddev : 955.312
> Forward
> 	Mean : 7012.24
> 	Stddev : 1377.27
> Reverse
> 	Mean : 6688.83
> 	Stddev : 589.935
> -----------------
> Chunk Size = 16KB
> -----------------
> Oneshot
> 	Mean : 6846.87
> 	Stddev : 546.173
> Forward
> 	Mean : 6785.26
> 	Stddev : 248.022
> Reverse
> 	Mean : 6613.33
> 	Stddev : 350.003
> -----------------
> Chunk Size = 32KB
> -----------------
> Oneshot
> 	Mean : 6862.19
> 	Stddev : 870.524
> Forward
> 	Mean : 6826.3
> 	Stddev : 870.023
> Reverse
> 	Mean : 6747.69
> 	Stddev : 1047.5
> -----------------
> Chunk Size = 64KB
> -----------------
> Oneshot
> 	Mean : 6806.9
> 	Stddev : 609.112
> Forward
> 	Mean : 6774.53
> 	Stddev : 311.954
> Reverse
> 	Mean : 6553.47
> 	Stddev : 293.52

Per my understanding, X86 cannot benefit anything from the change.

Best Regards,
Huang, Ying




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux