Performance Testing =================== I've run some limited performance benchmarks: First, a real-world benchmark that causes a lot of page table manipulation (and therefore we would expect to see regression here if we are going to see it anywhere); kernel compilation. It barely registers a change. Values are times, so smaller is better. All relative to base-4k: | | kern | kern | user | user | real | real | | config | mean | stdev | mean | stdev | mean | stdev | |-------------|---------|---------|---------|---------|---------|---------| | base-4k | 0.0% | 1.1% | 0.0% | 0.3% | 0.0% | 0.3% | | compile-4k | -0.2% | 1.1% | -0.2% | 0.3% | -0.1% | 0.3% | | boot-4k | 0.1% | 1.0% | -0.3% | 0.2% | -0.2% | 0.2% | The Speedometer JavaScript benchmark also shows no change. Values are runs per min, so bigger is better. All relative to base-4k: | config | mean | stdev | |-------------|---------|---------| | base-4k | 0.0% | 0.8% | | compile-4k | 0.4% | 0.8% | | boot-4k | 0.0% | 0.9% | Finally, I've run some microbenchmarks known to stress page table manipulations (originally from David Hildenbrand). The fork test maps/allocs 1G of anon memory, then measures the cost of fork(). The munmap test maps/allocs 1G of anon memory then measures the cost of munmap()ing it. The fork test is known to be extremely sensitive to any changes that cause instructions to be aligned differently in cachelines. When using this test for other changes, I've seen double digit regressions for the slightest thing, so 12% regression on this test is actually fairly good. This likely represents the extreme worst case for regressions that will be observed across other microbenchmarks (famous last words). Values are times, so smaller is better. All relative to base-4k:
... and here I am, worrying about much smaller degradation in these micro-benchmark ;) You're right, these are pure micro-benchmarks, and while 12% does sound like "much", even stupid compiler code movement can result in such changes in the fork() micro benchmark.
So I think this is just fine, and actually "surprisingly" small. And, there is even a way to statically compile a page size and not worry about that at all.
As discussed ahead of times, I consider this change very valuable. In RHEL, the biggest issue is actually the test matrix, that cannot really be reduced significantly ... but it will make shipping/packaging easier.
CCing Don, who did the separate 64k RHEL flavor kernel. -- Cheers, David / dhildenb