[PATCH 0/6] Fix TLB invalidation on arm64

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

[+linux-arch since the end of this may be applicable to other architectures]

Commit 24fe1b0efad4fcdd ("arm64: Remove unnecessary ISBs from
set_{pte,pmd,pud") removed ISB instructions immediately following updates to
the page table, on the grounds that they are not required by the
architecture and a DSB alone is sufficient to ensure that subsequent data
accesses use the new translation:

  DDI0487E_a, B2-128:

  | ... no instruction that appears in program order after the DSB instruction
  | can alter any state of the system or perform any part of its functionality
  | until the DSB completes other than:
  |
  | * Being fetched from memory and decoded
  | * Reading the general-purpose, SIMD and floating-point, Special-purpose, or
  |   System registers that are directly or indirectly read without causing
  |   side-effects.

However, the same document also states the following:

  DDI0487E_a, B2-125:

  | DMB and DSB instructions affect reads and writes to the memory system
  | generated by Load/Store instructions and data or unified cache maintenance
  | instructions being executed by the PE. Instruction fetches or accesses
  | caused by a hardware translation table access are not explicit accesses.

which appears to claim that the DSB alone is insufficient. Unfortunately,
some CPU designers have followed the second clause above, whereas in Linux
we've been relying on the first. This means that our mapping sequence:

	MOV	X0, <valid pte>
	STR	X0, [Xptep]	// Store new PTE to page table
	DSB	ISHST
	LDR	X1, [X2]	// Translates using the new PTE

can actually raise a translation fault on the load instruction because the
translation can be performed speculatively before the page table update and
then marked as "faulting" by the CPU. For user PTEs, this is ok because we
can handle the spurious fault, but for kernel PTEs and intermediate table
entries this results in a panic().

We can fix this by reverting 24fe1b0efad4fcdd, but the fun doesn't stop
there. If we consider the unmap case, then a similar constraint applies to
ordering subsequent memory accesses after the completion of the TLB
invalidation, so we also need to add an ISB instruction to
__flush_tlb_kernel_pgtable(). For user addresses, the exception return
provides the necessary context synchronisation.

This then raises an interesting question: if an ISB is required after a TLBI
instruction to prevent speculative translation of subsequent instructions,
how is this speculation prevented on concurrent CPUs that receive the
broadcast TLB invalidation message? Sending and completing a broadcast TLB
invalidation message does not imply execution of an ISB on the remote CPU,
however it /does/ require that the remote CPU will no longer make use of any
old translations because otherwise we wouldn't be able to guarantee that an
unmapped page could no longer be modified. In this regard, receiving a TLB
invalidation is in some ways stronger than sending one (where you need the
ISB).

So far, so good, but the final piece of the puzzle isn't quite so rosy.

*** Other architecture maintainers -- start here! ***

In the case that one CPU maps a page and then sets a flag to tell another
CPU:

	CPU 0
	-----

	MOV	X0, <valid pte>
	STR	X0, [Xptep]	// Store new PTE to page table
	DSB	ISHST
	ISB
	MOV	X1, #1
	STR	X1, [Xflag]	// Set the flag

	CPU 1
	-----

loop:	LDAR	X0, [Xflag]	// Poll flag with Acquire semantics
	CBZ	X0, loop
	LDR	X1, [X2]	// Translates using the new PTE

then the final load on CPU 1 can raise a translation fault for the same
reasons as mentioned at the start of this description. In reality, code
such as:

	CPU 0				CPU 1
	-----				-----
	spin_lock(&lock);		spin_lock(&lock);
	*ptr = vmalloc(size);		if (*ptr)
	spin_unlock(&lock);			foo = **ptr;
					spin_unlock(&lock);

will not trigger the fault because there is an address dependency on
CPU1 which prevents the speculative translation. However, more exotic
code where the virtual address is known ahead of time, such as:

	CPU 0				CPU 1
	-----				-----
	spin_lock(&lock);		spin_lock(&lock);
	set_fixmap(0, paddr, prot);	if (mapped)
	mapped = true;				foo = *fix_to_virt(0);
	spin_unlock(&lock);		spin_unlock(&lock);

could fault. This can be avoided by any of:

	* Introducing broadcast TLB maintenance on the map path
	* Adding a DSB;ISB sequence after checking a flag which indicates
	  that a virtual address is now mapped
	* Handling the spurious fault

Given that we have never observed a problem in the concurrent case under
Linux and future revisions of the architecture are being tightened so that
translation table walks are effectively ordered in the same way as explicit
memory accesses, we no longer treat spurious kernel faults as fatal if the
page table indicates that the access was valid.

Anyway, this patch series attempts to implement some of this and I plan
to queue it for 5.4.

Will

Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
Cc: Mark Rutland <mark.rutland@xxxxxxx>
Cc: Marc Zyngier <maz@xxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>

--->8

Will Deacon (6):
  Revert "arm64: Remove unnecessary ISBs from set_{pte,pmd,pud}"
  arm64: tlb: Ensure we execute an ISB following walk cache invalidation
  arm64: mm: Add ISB instruction to set_pgd()
  arm64: sysreg: Add some field definitions for PAR_EL1
  arm64: mm: Ignore spurious translation faults taken from the kernel
  arm64: kvm: Replace hardcoded '1' with SYS_PAR_EL1_F

 arch/arm64/include/asm/pgtable.h  | 13 ++++++++++---
 arch/arm64/include/asm/sysreg.h   |  3 +++
 arch/arm64/include/asm/tlbflush.h |  1 +
 arch/arm64/kvm/hyp/switch.c       |  2 +-
 arch/arm64/mm/fault.c             | 33 +++++++++++++++++++++++++++++++++
 5 files changed, 48 insertions(+), 4 deletions(-)

-- 
2.11.0




[Index of Archives]     [Linux Kernel]     [Kernel Newbies]     [x86 Platform Driver]     [Netdev]     [Linux Wireless]     [Netfilter]     [Bugtraq]     [Linux Filesystems]     [Yosemite Discussion]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]

  Powered by Linux