On 2024/5/7 18:08, Ryan Roberts wrote:
On 07/05/2024 09:25, Kefeng Wang wrote:
Hi Ryan, Yang and all,
We see another regression on arm64(no issue on x86) when test memory
latency from lmbench,
./lat_mem_rd -P 1 512M 128
Do you know exectly what this test is doing?
lat_mem_rd measures memory read latency for varying memory sizes and
strides, see https://lmbench.sourceforge.net/man/lat_mem_rd.8.html
memory latency(smaller is better)
MiB 6.9-rc7 6.9-rc7+revert
And what exactly have you reverted? I'm guessing just commit efa7df3e3bb5 ("mm:
align larger anonymous mappings on THP boundaries")?
Yes, just revert efa7df3e3bb5.
0.00049 1.539 1.539
0.00098 1.539 1.539
0.00195 1.539 1.539
0.00293 1.539 1.539
0.00391 1.539 1.539
0.00586 1.539 1.539
0.00781 1.539 1.539
0.01172 1.539 1.539
0.01562 1.539 1.539
0.02344 1.539 1.539
0.03125 1.539 1.539
0.04688 1.539 1.539
0.0625 1.540 1.540
0.09375 3.634 3.086
So the first regression is for 96K - I'm guessing that's the mmap size? That
size shouldn't even be affected by this patch, apart from a few adds and a
compare which determines the size is too small to do PMD alignment for.
Yes, no anon thp.
0.125 3.874 3.175
0.1875 3.544 3.288
0.25 3.556 3.461
0.375 3.641 3.644
0.5 4.125 3.851
0.75 4.968 4.323
1 5.143 4.686
1.5 5.309 4.957
2 5.370 5.116
3 5.430 5.471
4 5.457 5.671
6 6.100 6.170
8 6.496 6.468
-----------------------s
* L1 cache = 8M, it is no big changes below 8M *
* but the latency reduce a lot when revert this patch from L2 *
12 6.917 6.840
16 7.268 7.077
24 7.536 7.345
32 10.723 9.421
48 14.220 11.350
64 16.253 12.189
96 14.494 12.507
128 14.630 12.560
192 15.402 12.967
256 16.178 12.957
384 15.177 13.346
512 15.235 13.233
After quickly check the smaps, but don't find any clues, any suggestion?
Without knowing exactly what the test does, it's difficult to know what to
The major operation(memory read) shows below,
#define ONE p = (char **)*p;
#define FIVE ONE ONE ONE ONE ONE
#define TEN FIVE FIVE
#define FIFTY TEN TEN TEN TEN TEN
#define HUNDRED FIFTY FIFTY
while (iterations-- > 0) {
for (i = 0; i < count; ++i) {
HUNDRED;
}
}
https://github.com/intel/lmbench/blob/master/src/lat_mem_rd.c#L95
suggest. If you want to try something semi-randomly; it might be useful to rule
out the arm64 contpte feature. I don't see how that would be interacting here if
mTHP is disabled (is it?). But its new for 6.9 and arm64 only. Disable with
ARM64_CONTPTE (needs EXPERT) at compile time.
I don't enabled mTHP, so it should be not related about ARM64_CONTPTE,
but will have a try.