Re: [RESEND PATCH] mm: align larger anonymous mappings on THP boundaries

Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> · Tue, 7 May 2024 18:59:31 +0800

On 2024/5/7 18:08, Ryan Roberts wrote:
On 07/05/2024 09:25, Kefeng Wang wrote:
Hi Ryan, Yang and all,

We see another regression on arm64(no issue on x86) when test memory
latency from lmbench,

./lat_mem_rd -P 1 512M 128

Do you know exectly what this test is doing?

lat_mem_rd measures memory read latency for varying memory sizes and
strides, see https://lmbench.sourceforge.net/man/lat_mem_rd.8.html

memory latency(smaller is better)

MiB     6.9-rc7    6.9-rc7+revert

And what exactly have you reverted? I'm guessing just commit efa7df3e3bb5 ("mm:
align larger anonymous mappings on THP boundaries")?

Yes, just revert efa7df3e3bb5.

0.00049    1.539     1.539
0.00098    1.539     1.539
0.00195    1.539     1.539
0.00293    1.539     1.539
0.00391    1.539     1.539
0.00586    1.539     1.539
0.00781    1.539     1.539
0.01172    1.539     1.539
0.01562    1.539     1.539
0.02344    1.539     1.539
0.03125    1.539     1.539
0.04688    1.539     1.539
0.0625    1.540     1.540
0.09375    3.634     3.086

So the first regression is for 96K - I'm guessing that's the mmap size? That
size shouldn't even be affected by this patch, apart from a few adds and a
compare which determines the size is too small to do PMD alignment for.

Yes, no anon thp.

0.125   3.874     3.175
0.1875  3.544     3.288
0.25    3.556     3.461
0.375   3.641     3.644
0.5     4.125     3.851
0.75    4.968     4.323
1       5.143     4.686
1.5     5.309     4.957
2       5.370     5.116
3       5.430     5.471
4       5.457     5.671
6       6.100     6.170
8       6.496     6.468

-----------------------s
* L1 cache = 8M, it is no big changes below 8M *
* but the latency reduce a lot when revert this patch from L2 *

12      6.917     6.840
16      7.268     7.077
24      7.536     7.345
32      10.723     9.421
48      14.220     11.350
64      16.253     12.189
96      14.494     12.507
128     14.630     12.560
192     15.402     12.967
256     16.178     12.957
384     15.177     13.346
512     15.235     13.233

After quickly check the smaps, but don't find any clues, any suggestion?

Without knowing exactly what the test does, it's difficult to know what to

The major operation(memory read) shows below,

#define    ONE      p = (char **)*p;
#define    FIVE     ONE ONE ONE ONE ONE
#define    TEN      FIVE FIVE
#define    FIFTY    TEN TEN TEN TEN TEN
#define    HUNDRED  FIFTY FIFTY

    while (iterations-- > 0) {
        for (i = 0; i < count; ++i) {
            HUNDRED;
        }
    }

https://github.com/intel/lmbench/blob/master/src/lat_mem_rd.c#L95

suggest. If you want to try something semi-randomly; it might be useful to rule
out the arm64 contpte feature. I don't see how that would be interacting here if
mTHP is disabled (is it?). But its new for 6.9 and arm64 only. Disable with
ARM64_CONTPTE (needs EXPERT) at compile time.
I don't enabled mTHP, so it should be not related about ARM64_CONTPTE, 
but will have a try.