Re: [RFC PATCH v1 00/57] Boot-time page size selection for arm64

Ryan Roberts <ryan.roberts@xxxxxxx> · Mon, 11 Nov 2024 12:25:35 +0000

Hi Petr,

On 11/11/2024 12:14, Petr Tesarik wrote:
> Hi Ryan,
> 
> On Thu, 17 Oct 2024 13:32:43 +0100
> Ryan Roberts <ryan.roberts@xxxxxxx> wrote:
> 
>> [...]
>> I understand that Suse might be able to help with wider performance testing
> 
> Sorry for the delay (vacation, other tasks). Anyway, let me share some
> results with you.

Not at all; thanks for coming back with these results!

> 
> First, I have looked only at 4k pages (constant v. selected at boot
> time) so far.
> 
> Second, the impact of the patch series is much smaller than I expected.
> Most macro-benchmarks (dbench, io-bench) did not see any significant
> slowdown. There appears to be a performance hit of approx. 1-2%, but
> that's within noise, and I can't dedicate my time to running extensive
> tests to find the distribution peak and compare. In short, I suspect a
> slight performance hit, but I cannot quantify it.
> 
> Third, a few micro-benchmarks saw a significant regression.
> 
> Most notably, getenv and getenvT2 tests from libMicro were 18% and 20%
> slower with variable page size. I don't know why, but I'm looking into
> it. The system() library call was also about 18% slower, but that might
> be related.

OK, ouch. I think there are some things we can try to optimize the
implementation further. But I'll wait for your analysis before digging myself.

You probably also saw the conversation with Catalin about the cost vs benefit of
this series. Performance regressions will all need to be considered in the cost
column, of course. So understanding the root cause and trying to reduce the
regression as much as possible will increase chances of getting it accepted
upstream.

Thanks,
Ryan

> 
> The dup() syscall was up to 5% slower (depends on underlying filesystem
> type).
> 
> VMA unmap was slower for some sizes, but the pattern seemed random,
> sometimes giving even better performance with variable page size, so
> this micro-benchmark may be too unstable to draw any conclusions.
> 
> Stay tuned
> Petr T