On Wed, Jul 31, 2024 at 11:15:05AM +0200, Thomas Gleixner wrote: > On Wed, Jul 31 2024 at 14:27, Shivank Garg wrote: > > lmbench:lat_pagefault: Metric- page-fault time (us) - Lower is better > > 4-Level PT 5-Level PT % Change > > THP-never Mean:0.4068 Mean:0.4294 5.56 > > 95% CI:0.4057-0.4078 95% CI:0.4287-0.4302 > > > > THP-Always Mean: 0.4061 Mean: 0.4288 % Change > > 95% CI: 0.4051-0.4071 95% CI: 0.4281-0.4295 5.59 > > > > Inference: > > 5-level page table shows increase in page-fault latency but it does > > not significantly impact other benchmarks. > > 5% regression on lmbench is a NONO. > > 5-level page tables add a cost in every hardware page table walk. That's > a matter of fact and there is absolutely no reason to inflict this cost > on everyone. > > The solution to this to make the 5-level mechanics smarter by evaluating > whether the machine has enough memory to require 5-level tables and > select the depth at boot time. I gotta mention (again) that its a pain we can't mix and match like s390. They default run their userspace on 4 level, even if the kernel runs 5. Only silly daft userspace that needs more than insane amounts of memory get 5 level.