On 07.05.24 13:26, Ryan Roberts wrote:
On 07/05/2024 12:14, Ryan Roberts wrote:
On 07/05/2024 12:13, David Hildenbrand wrote:
https://github.com/intel/lmbench/blob/master/src/lat_mem_rd.c#L95
suggest. If you want to try something semi-randomly; it might be useful to rule
out the arm64 contpte feature. I don't see how that would be interacting here if
mTHP is disabled (is it?). But its new for 6.9 and arm64 only. Disable with
ARM64_CONTPTE (needs EXPERT) at compile time.
I don't enabled mTHP, so it should be not related about ARM64_CONTPTE,
but will have a try.
cont-pte can get active if we're just lucky when allocating pages in the right
order, correct Ryan?
No it shouldn't do; it requires the pages to be in the same folio.
Ah, my memory comes back. That's also important for folio_pte_batch() to
currently work as expected I think. We could change that, though, and
let cont-pte batch across folios.
That said, if we got lucky in allocating the "right" pages, then we will end up
doing an extra function call and a bit of maths per every 16 PTEs in order to
figure out that the span is not contained by a single folio, before backing out
of an attempt to fold. That would probably be just about measurable.
But the regression doesn't kick in until 96K, which is the step after 64K. I'd
expect to see the regression on 64K too if that was the issue. The cacheline is
64K so I suspect it could be something related to the cache?
--
Cheers,
David / dhildenb