Re: [RFC 15/20] mm: detect deferred TLB flushes in vma granularity

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Jan 30, 2021, at 4:11 PM, Nadav Amit <nadav.amit@xxxxxxxxx> wrote:
> 
> From: Nadav Amit <namit@xxxxxxxxxx>
> 
> Currently, deferred TLB flushes are detected in the mm granularity: if
> there is any deferred TLB flush in the entire address space due to NUMA
> migration, pte_accessible() in x86 would return true, and
> ptep_clear_flush() would require a TLB flush. This would happen even if
> the PTE resides in a completely different vma.

[ snip ]

> +static inline void read_defer_tlb_flush_gen(struct mmu_gather *tlb)
> +{
> +	struct mm_struct *mm = tlb->mm;
> +	u64 mm_gen;
> +
> +	/*
> +	 * Any change of PTE before calling __track_deferred_tlb_flush() must be
> +	 * performed using RMW atomic operation that provides a memory barriers,
> +	 * such as ptep_modify_prot_start().  The barrier ensure the PTEs are
> +	 * written before the current generation is read, synchronizing
> +	 * (implicitly) with flush_tlb_mm_range().
> +	 */
> +	smp_mb__after_atomic();
> +
> +	mm_gen = atomic64_read(&mm->tlb_gen);
> +
> +	/*
> +	 * This condition checks for both first deferred TLB flush and for other
> +	 * TLB pending or executed TLB flushes after the last table that we
> +	 * updated. In the latter case, we are going to skip a generation, which
> +	 * would lead to a full TLB flush. This should therefore not cause
> +	 * correctness issues, and should not induce overheads, since anyhow in
> +	 * TLB storms it is better to perform full TLB flush.
> +	 */
> +	if (mm_gen != tlb->defer_gen) {
> +		VM_BUG_ON(mm_gen < tlb->defer_gen);
> +
> +		tlb->defer_gen = inc_mm_tlb_gen(mm);
> +	}
> +}

Andy’s comments managed to make me realize this code is wrong. We must
call inc_mm_tlb_gen(mm) every time.

Otherwise, a CPU that saw the old tlb_gen and updated it in its local
cpu_tlbstate on a context-switch. If the process was not running when the
TLB flush was issued, no IPI will be sent to the CPU. Therefore, later
switch_mm_irqs_off() back to the process will not flush the local TLB.

I need to think if there is a better solution. Multiple calls to
inc_mm_tlb_gen() during deferred flushes would trigger a full TLB flush
instead of one that is specific to the ranges, once the flush actually takes
place. On x86 it’s practically a non-issue, since anyhow any update of more
than 33-entries or so would cause a full TLB flush, but this is still ugly.






[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux