On Wed, Jul 31, 2024 at 11:15:05AM +0200, Thomas Gleixner wrote: > On Wed, Jul 31 2024 at 14:27, Shivank Garg wrote: > > lmbench:lat_pagefault: Metric- page-fault time (us) - Lower is better > > 4-Level PT 5-Level PT % Change > > THP-never Mean:0.4068 Mean:0.4294 5.56 > > 95% CI:0.4057-0.4078 95% CI:0.4287-0.4302 > > > > THP-Always Mean: 0.4061 Mean: 0.4288 % Change > > 95% CI: 0.4051-0.4071 95% CI: 0.4281-0.4295 5.59 > > > > Inference: > > 5-level page table shows increase in page-fault latency but it does > > not significantly impact other benchmarks. > > 5% regression on lmbench is a NONO. Yeah, that's a biggy. In our testing (on Intel HW) we didn't see any significant difference between 4- and 5-level paging. But we were focused on TLB fill latency. In both bare metal and in VMs. Maybe something wrong in the fault path? It requires a closer look. Shivank, could you share how you run lat_pagefault? What file size? How parallel you run it?... It would also be nice to get perf traces. Maybe it is purely SW issue. > 5-level page tables add a cost in every hardware page table walk. That's > a matter of fact and there is absolutely no reason to inflict this cost > on everyone. > > The solution to this to make the 5-level mechanics smarter by evaluating > whether the machine has enough memory to require 5-level tables and > select the depth at boot time. Let's understand the reason first. The risk with your proposal is that 5-level paging will not get any testing and rot over time. I would like to keep it on, if possible. -- Kiryl Shutsemau / Kirill A. Shutemov