On Mon, Nov 16, 2020 at 08:28:23AM -0800, Dave Hansen wrote: > On 11/16/20 7:54 AM, Matthew Wilcox wrote: > > It gets even more complicated with CPUs with multiple levels of TLB > > which support different TLB entry sizes. My CPU reports: > > > > TLB info > > Instruction TLB: 2M/4M pages, fully associative, 8 entries > > Instruction TLB: 4K pages, 8-way associative, 64 entries > > Data TLB: 1GB pages, 4-way set associative, 4 entries > > Data TLB: 4KB pages, 4-way associative, 64 entries > > Shared L2 TLB: 4KB/2MB pages, 6-way associative, 1536 entries > > It's even "worse" on recent AMD systems. Those will coalesce multiple > adjacent PTEs into a single TLB entry. I think Alphas did something > like this back in the day with an opt-in. I debated mentioning that ;-) We can detect in software whether that's _possible_, but we can't detect whether it's *done* it. I heard it sometimes takes several faults on the 4kB entries for the CPU to decide that it's beneficial to use a 32kB TLB entry. But this is all rumour. > Anyway, the changelog should probably replace: > > > This enables PERF_SAMPLE_{DATA,CODE}_PAGE_SIZE to report accurate TLB > > page sizes. > > with something more like: > > This enables PERF_SAMPLE_{DATA,CODE}_PAGE_SIZE to report accurate page > table mapping sizes. > > That's really the best we can do from software without digging into > microarchitecture-specific events. I mean this is perf. Digging into microarch specific events is what it does ;-)