Re: [PATCH 11/12] x86/mm: enable AMD translation cache extensions

Tom Lendacky <thomas.lendacky@xxxxxxx> · Fri, 10 Jan 2025 13:34:58 -0600

On 12/30/24 11:53, Rik van Riel wrote:
> With AMD TCE (translation cache extensions) only the intermediate mappings
> that cover the address range zapped by INVLPG / INVLPGB get invalidated,
> rather than all intermediate mappings getting zapped at every TLB invalidation.
> 
> This can help reduce the TLB miss rate, by keeping more intermediate
> mappings in the cache.
> 
> From the AMD manual:
> 
> Translation Cache Extension (TCE) Bit. Bit 15, read/write. Setting this bit
> to 1 changes how the INVLPG, INVLPGB, and INVPCID instructions operate on
> TLB entries. When this bit is 0, these instructions remove the target PTE
> from the TLB as well as all upper-level table entries that are cached
> in the TLB, whether or not they are associated with the target PTE.
> When this bit is set, these instructions will remove the target PTE and
> only those upper-level entries that lead to the target PTE in
> the page table hierarchy, leaving unrelated upper-level entries intact.
> 
> Signed-off-by: Rik van Riel <riel@xxxxxxxxxxx>
> ---
>  arch/x86/kernel/cpu/amd.c |  8 ++++++++
>  arch/x86/mm/tlb.c         | 10 +++++++---
>  2 files changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index 226b8fc64bfc..4dc42705aaca 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -1143,6 +1143,14 @@ static void cpu_detect_tlb_amd(struct cpuinfo_x86 *c)
>  
>  	/* Max number of pages INVLPGB can invalidate in one shot */
>  	invlpgb_count_max = (edx & 0xffff) + 1;
> +
> +	/* If supported, enable translation cache extensions (TCE) */
> +	cpuid(0x80000001, &eax, &ebx, &ecx, &edx);
> +	if (ecx & BIT(17)) {

Back to my comment from patch #4, you can put this under the
cpu_feature_enabled() check and just set it.

> +		u64 msr = native_read_msr(MSR_EFER);;
> +		msr |= BIT(15);
> +		wrmsrl(MSR_EFER, msr);

msr_set_bit() ?

Thanks,
Tom

> +	}
>  }
>  
>  static const struct cpu_dev amd_cpu_dev = {
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index 454a370494d3..585d0731ca9f 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -477,7 +477,7 @@ static void broadcast_tlb_flush(struct flush_tlb_info *info)
>  	if (info->stride_shift > PMD_SHIFT)
>  		maxnr = 1;
>  
> -	if (info->end == TLB_FLUSH_ALL) {
> +	if (info->end == TLB_FLUSH_ALL || info->freed_tables) {
>  		invlpgb_flush_single_pcid(kern_pcid(asid));
>  		/* Do any CPUs supporting INVLPGB need PTI? */
>  		if (static_cpu_has(X86_FEATURE_PTI))
> @@ -1110,7 +1110,7 @@ static void flush_tlb_func(void *info)
>  	 *
>  	 * The only question is whether to do a full or partial flush.
>  	 *
> -	 * We do a partial flush if requested and two extra conditions
> +	 * We do a partial flush if requested and three extra conditions
>  	 * are met:
>  	 *
>  	 * 1. f->new_tlb_gen == local_tlb_gen + 1.  We have an invariant that
> @@ -1137,10 +1137,14 @@ static void flush_tlb_func(void *info)
>  	 *    date.  By doing a full flush instead, we can increase
>  	 *    local_tlb_gen all the way to mm_tlb_gen and we can probably
>  	 *    avoid another flush in the very near future.
> +	 *
> +	 * 3. No page tables were freed. If page tables were freed, a full
> +	 *    flush ensures intermediate translations in the TLB get flushed.
>  	 */
>  	if (f->end != TLB_FLUSH_ALL &&
>  	    f->new_tlb_gen == local_tlb_gen + 1 &&
> -	    f->new_tlb_gen == mm_tlb_gen) {
> +	    f->new_tlb_gen == mm_tlb_gen &&
> +	    !f->freed_tables) {
>  		/* Partial flush */
>  		unsigned long addr = f->start;
>