Re: [PATCH v10 09/12] x86/mm: enable broadcast TLB invalidation for multi-threaded processes

Peter Zijlstra <peterz@xxxxxxxxxxxxx> · Wed, 12 Feb 2025 10:54:38 +0100

On Tue, Feb 11, 2025 at 04:08:04PM -0500, Rik van Riel wrote:

> +static void broadcast_tlb_flush(struct flush_tlb_info *info)
> +{
> +	bool pmd = info->stride_shift == PMD_SHIFT;
> +	unsigned long maxnr = invlpgb_count_max;
> +	unsigned long asid = info->mm->context.global_asid;
> +	unsigned long addr = info->start;
> +	unsigned long nr;
> +
> +	/* Flushing multiple pages at once is not supported with 1GB pages. */
> +	if (info->stride_shift > PMD_SHIFT)
> +		maxnr = 1;

How does this work?

Normally, if we get a 1GB range, we'll iterate on the stride and INVLPG
each one (just like any other stride).

Should you not instead either force the stride down to PMD level or
force a full flush?

> +
> +	/*
> +	 * TLB flushes with INVLPGB are kicked off asynchronously.
> +	 * The inc_mm_tlb_gen() guarantees page table updates are done
> +	 * before these TLB flushes happen.
> +	 */
> +	if (info->end == TLB_FLUSH_ALL) {
> +		invlpgb_flush_single_pcid_nosync(kern_pcid(asid));
> +		/* Do any CPUs supporting INVLPGB need PTI? */
> +		if (static_cpu_has(X86_FEATURE_PTI))
> +			invlpgb_flush_single_pcid_nosync(user_pcid(asid));
> +	} else do {
> +		/*
> +		 * Calculate how many pages can be flushed at once; if the
> +		 * remainder of the range is less than one page, flush one.
> +		 */
> +		nr = min(maxnr, (info->end - addr) >> info->stride_shift);
> +		nr = max(nr, 1);
> +
> +		invlpgb_flush_user_nr_nosync(kern_pcid(asid), addr, nr, pmd);
> +		/* Do any CPUs supporting INVLPGB need PTI? */
> +		if (static_cpu_has(X86_FEATURE_PTI))
> +			invlpgb_flush_user_nr_nosync(user_pcid(asid), addr, nr, pmd);
> +
> +		addr += nr << info->stride_shift;
> +	} while (addr < info->end);
> +
> +	finish_asid_transition(info);
> +
> +	/* Wait for the INVLPGBs kicked off above to finish. */
> +	tlbsync();
> +}