Re: [PATCH 0/6] Fix TLB invalidation on arm64

Mark Rutland <mark.rutland@xxxxxxx> · Tue, 27 Aug 2019 17:19:26 +0100

On Tue, Aug 27, 2019 at 02:18:12PM +0100, Will Deacon wrote:
> Hi all,

Hi Will,

For the series:

Reviewed-by: Mark Rutland <mark.rutland@xxxxxxx>

Thanks,
Mark.

> 
> [+linux-arch since the end of this may be applicable to other architectures]
> 
> Commit 24fe1b0efad4fcdd ("arm64: Remove unnecessary ISBs from
> set_{pte,pmd,pud") removed ISB instructions immediately following updates to
> the page table, on the grounds that they are not required by the
> architecture and a DSB alone is sufficient to ensure that subsequent data
> accesses use the new translation:
> 
>   DDI0487E_a, B2-128:
> 
>   | ... no instruction that appears in program order after the DSB instruction
>   | can alter any state of the system or perform any part of its functionality
>   | until the DSB completes other than:
>   |
>   | * Being fetched from memory and decoded
>   | * Reading the general-purpose, SIMD and floating-point, Special-purpose, or
>   |   System registers that are directly or indirectly read without causing
>   |   side-effects.
> 
> However, the same document also states the following:
> 
>   DDI0487E_a, B2-125:
> 
>   | DMB and DSB instructions affect reads and writes to the memory system
>   | generated by Load/Store instructions and data or unified cache maintenance
>   | instructions being executed by the PE. Instruction fetches or accesses
>   | caused by a hardware translation table access are not explicit accesses.
> 
> which appears to claim that the DSB alone is insufficient. Unfortunately,
> some CPU designers have followed the second clause above, whereas in Linux
> we've been relying on the first. This means that our mapping sequence:
> 
> 	MOV	X0, <valid pte>
> 	STR	X0, [Xptep]	// Store new PTE to page table
> 	DSB	ISHST
> 	LDR	X1, [X2]	// Translates using the new PTE
> 
> can actually raise a translation fault on the load instruction because the
> translation can be performed speculatively before the page table update and
> then marked as "faulting" by the CPU. For user PTEs, this is ok because we
> can handle the spurious fault, but for kernel PTEs and intermediate table
> entries this results in a panic().
> 
> We can fix this by reverting 24fe1b0efad4fcdd, but the fun doesn't stop
> there. If we consider the unmap case, then a similar constraint applies to
> ordering subsequent memory accesses after the completion of the TLB
> invalidation, so we also need to add an ISB instruction to
> __flush_tlb_kernel_pgtable(). For user addresses, the exception return
> provides the necessary context synchronisation.
> 
> This then raises an interesting question: if an ISB is required after a TLBI
> instruction to prevent speculative translation of subsequent instructions,
> how is this speculation prevented on concurrent CPUs that receive the
> broadcast TLB invalidation message? Sending and completing a broadcast TLB
> invalidation message does not imply execution of an ISB on the remote CPU,
> however it /does/ require that the remote CPU will no longer make use of any
> old translations because otherwise we wouldn't be able to guarantee that an
> unmapped page could no longer be modified. In this regard, receiving a TLB
> invalidation is in some ways stronger than sending one (where you need the
> ISB).
> 
> So far, so good, but the final piece of the puzzle isn't quite so rosy.
> 
> *** Other architecture maintainers -- start here! ***
> 
> In the case that one CPU maps a page and then sets a flag to tell another
> CPU:
> 
> 	CPU 0
> 	-----
> 
> 	MOV	X0, <valid pte>
> 	STR	X0, [Xptep]	// Store new PTE to page table
> 	DSB	ISHST
> 	ISB
> 	MOV	X1, #1
> 	STR	X1, [Xflag]	// Set the flag
> 
> 	CPU 1
> 	-----
> 
> loop:	LDAR	X0, [Xflag]	// Poll flag with Acquire semantics
> 	CBZ	X0, loop
> 	LDR	X1, [X2]	// Translates using the new PTE
> 
> then the final load on CPU 1 can raise a translation fault for the same
> reasons as mentioned at the start of this description. In reality, code
> such as:
> 
> 	CPU 0				CPU 1
> 	-----				-----
> 	spin_lock(&lock);		spin_lock(&lock);
> 	*ptr = vmalloc(size);		if (*ptr)
> 	spin_unlock(&lock);			foo = **ptr;
> 					spin_unlock(&lock);
> 
> will not trigger the fault because there is an address dependency on
> CPU1 which prevents the speculative translation. However, more exotic
> code where the virtual address is known ahead of time, such as:
> 
> 	CPU 0				CPU 1
> 	-----				-----
> 	spin_lock(&lock);		spin_lock(&lock);
> 	set_fixmap(0, paddr, prot);	if (mapped)
> 	mapped = true;				foo = *fix_to_virt(0);
> 	spin_unlock(&lock);		spin_unlock(&lock);
> 
> could fault. This can be avoided by any of:
> 
> 	* Introducing broadcast TLB maintenance on the map path
> 	* Adding a DSB;ISB sequence after checking a flag which indicates
> 	  that a virtual address is now mapped
> 	* Handling the spurious fault
> 
> Given that we have never observed a problem in the concurrent case under
> Linux and future revisions of the architecture are being tightened so that
> translation table walks are effectively ordered in the same way as explicit
> memory accesses, we no longer treat spurious kernel faults as fatal if the
> page table indicates that the access was valid.
> 
> Anyway, this patch series attempts to implement some of this and I plan
> to queue it for 5.4.
> 
> Will
> 
> Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
> Cc: Mark Rutland <mark.rutland@xxxxxxx>
> Cc: Marc Zyngier <maz@xxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> 
> --->8
> 
> Will Deacon (6):
>   Revert "arm64: Remove unnecessary ISBs from set_{pte,pmd,pud}"
>   arm64: tlb: Ensure we execute an ISB following walk cache invalidation
>   arm64: mm: Add ISB instruction to set_pgd()
>   arm64: sysreg: Add some field definitions for PAR_EL1
>   arm64: mm: Ignore spurious translation faults taken from the kernel
>   arm64: kvm: Replace hardcoded '1' with SYS_PAR_EL1_F
> 
>  arch/arm64/include/asm/pgtable.h  | 13 ++++++++++---
>  arch/arm64/include/asm/sysreg.h   |  3 +++
>  arch/arm64/include/asm/tlbflush.h |  1 +
>  arch/arm64/kvm/hyp/switch.c       |  2 +-
>  arch/arm64/mm/fault.c             | 33 +++++++++++++++++++++++++++++++++
>  5 files changed, 48 insertions(+), 4 deletions(-)
> 
> -- 
> 2.11.0
>