On 12/30/24 11:53, Rik van Riel wrote: > With AMD TCE (translation cache extensions) only the intermediate mappings > that cover the address range zapped by INVLPG / INVLPGB get invalidated, > rather than all intermediate mappings getting zapped at every TLB invalidation. > > This can help reduce the TLB miss rate, by keeping more intermediate > mappings in the cache. > > From the AMD manual: > > Translation Cache Extension (TCE) Bit. Bit 15, read/write. Setting this bit > to 1 changes how the INVLPG, INVLPGB, and INVPCID instructions operate on > TLB entries. When this bit is 0, these instructions remove the target PTE > from the TLB as well as all upper-level table entries that are cached > in the TLB, whether or not they are associated with the target PTE. > When this bit is set, these instructions will remove the target PTE and > only those upper-level entries that lead to the target PTE in > the page table hierarchy, leaving unrelated upper-level entries intact. > > Signed-off-by: Rik van Riel <riel@xxxxxxxxxxx> > --- > arch/x86/kernel/cpu/amd.c | 8 ++++++++ > arch/x86/mm/tlb.c | 10 +++++++--- > 2 files changed, 15 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c > index 226b8fc64bfc..4dc42705aaca 100644 > --- a/arch/x86/kernel/cpu/amd.c > +++ b/arch/x86/kernel/cpu/amd.c > @@ -1143,6 +1143,14 @@ static void cpu_detect_tlb_amd(struct cpuinfo_x86 *c) > > /* Max number of pages INVLPGB can invalidate in one shot */ > invlpgb_count_max = (edx & 0xffff) + 1; > + > + /* If supported, enable translation cache extensions (TCE) */ > + cpuid(0x80000001, &eax, &ebx, &ecx, &edx); > + if (ecx & BIT(17)) { Back to my comment from patch #4, you can put this under the cpu_feature_enabled() check and just set it. > + u64 msr = native_read_msr(MSR_EFER);; > + msr |= BIT(15); > + wrmsrl(MSR_EFER, msr); msr_set_bit() ? Thanks, Tom > + } > } > > static const struct cpu_dev amd_cpu_dev = { > diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c > index 454a370494d3..585d0731ca9f 100644 > --- a/arch/x86/mm/tlb.c > +++ b/arch/x86/mm/tlb.c > @@ -477,7 +477,7 @@ static void broadcast_tlb_flush(struct flush_tlb_info *info) > if (info->stride_shift > PMD_SHIFT) > maxnr = 1; > > - if (info->end == TLB_FLUSH_ALL) { > + if (info->end == TLB_FLUSH_ALL || info->freed_tables) { > invlpgb_flush_single_pcid(kern_pcid(asid)); > /* Do any CPUs supporting INVLPGB need PTI? */ > if (static_cpu_has(X86_FEATURE_PTI)) > @@ -1110,7 +1110,7 @@ static void flush_tlb_func(void *info) > * > * The only question is whether to do a full or partial flush. > * > - * We do a partial flush if requested and two extra conditions > + * We do a partial flush if requested and three extra conditions > * are met: > * > * 1. f->new_tlb_gen == local_tlb_gen + 1. We have an invariant that > @@ -1137,10 +1137,14 @@ static void flush_tlb_func(void *info) > * date. By doing a full flush instead, we can increase > * local_tlb_gen all the way to mm_tlb_gen and we can probably > * avoid another flush in the very near future. > + * > + * 3. No page tables were freed. If page tables were freed, a full > + * flush ensures intermediate translations in the TLB get flushed. > */ > if (f->end != TLB_FLUSH_ALL && > f->new_tlb_gen == local_tlb_gen + 1 && > - f->new_tlb_gen == mm_tlb_gen) { > + f->new_tlb_gen == mm_tlb_gen && > + !f->freed_tables) { > /* Partial flush */ > unsigned long addr = f->start; >