Prathu Baronia <prathu.baronia@xxxxxxxxxxx> writes: > The 04/15/2020 11:27, Huang, Ying wrote: >> >> Can you describe your test? >> > We profile the clear_huge_page() using ftrace while parallely force triggering it by a simple > userspace test code which allocates 100MB of anon memory and traverses through > it in loop. >> >> You have tested the chunk sizes 4KB and 2MB, can you test some values in >> between? For example 32KB or 64KB? Maybe there's a sweet point with >> some smaller granularity and good performance. > Based on your advise I tried chunk sizes of 4KB, 8KB, 16KB, 32KB and 64KB on > arm64 and x86_64 by copying the kernel memset implementation for both the archs. > ------------------------------------------------------------------------------- > Results(the sample size is 100 for each and the values are in us):- > ------------------------------------------------------------------------------- > ARM64(CPU0 & 6 on and set at max frequency, DDR set to performance governor):- > ------------------------------------------------------------------------------- > Chunk Size = 4KB > ----------------- > Oneshot > Mean : 3402.06 > Stddev : 72.6576 > Forward > Mean : 3408.04 > Stddev : 72.976 > Reverse > Mean : 17699.3 > Stddev : 132.875 > ----------------- > Chunk Size = 8KB > ----------------- > Oneshot > Mean : 3398.64 > Stddev : 80.6334 > Forward > Mean : 3391.58 > Stddev : 65.9063 > Reverse > Mean : 13909.2 > Stddev : 194.324 > ----------------- > Chunk Size = 16KB > ----------------- > Oneshot > Mean : 3393.57 > Stddev : 72.2485 > Forward > Mean : 3404.69 > Stddev : 84.4705 > Reverse > Mean : 9278.65 > Stddev : 217.725 > ----------------- > Chunk Size = 32KB > ----------------- > Oneshot > Mean : 3425.7 > Stddev : 129.156 > Forward > Mean : 3402.07 > Stddev : 82.6713 > Reverse > Mean : 6831.43 > Stddev : 184.807 > ----------------- > Chunk Size = 64KB > ----------------- > Oneshot > Mean : 3398.72 > Stddev : 77.9703 > Forward > Mean : 3413.52 > Stddev : 173.121 > Reverse > Mean : 5542.84 > Stddev : 197.017 Maybe a little larger chunk size is good enough for ARM64? > --------------------------------------------- > x86_64(Only CPU0 on and set to max frequency) > --------------------------------------------- > Chunk Size = 4KB > ----------------- > Oneshot > Mean : 6752.59 > Stddev : 298.988 > Forward > Mean : 6873.6 > Stddev : 325.607 > Reverse > Mean : 6722.88 > Stddev : 365.837 > ----------------- > Chunk Size = 8KB > ----------------- > Oneshot > Mean : 6848.57 > Stddev : 955.312 > Forward > Mean : 7012.24 > Stddev : 1377.27 > Reverse > Mean : 6688.83 > Stddev : 589.935 > ----------------- > Chunk Size = 16KB > ----------------- > Oneshot > Mean : 6846.87 > Stddev : 546.173 > Forward > Mean : 6785.26 > Stddev : 248.022 > Reverse > Mean : 6613.33 > Stddev : 350.003 > ----------------- > Chunk Size = 32KB > ----------------- > Oneshot > Mean : 6862.19 > Stddev : 870.524 > Forward > Mean : 6826.3 > Stddev : 870.023 > Reverse > Mean : 6747.69 > Stddev : 1047.5 > ----------------- > Chunk Size = 64KB > ----------------- > Oneshot > Mean : 6806.9 > Stddev : 609.112 > Forward > Mean : 6774.53 > Stddev : 311.954 > Reverse > Mean : 6553.47 > Stddev : 293.52 Per my understanding, X86 cannot benefit anything from the change. Best Regards, Huang, Ying