On Fri, Nov 22, 2019 at 09:51:16AM +0000, Will Deacon wrote: > From: Pavel Tatashin <pasha.tatashin@xxxxxxxxxx> > > commit 94bb804e1e6f0a9a77acf20d7c70ea141c6c821e upstream. > > A number of our uaccess routines ('__arch_clear_user()' and > '__arch_copy_{in,from,to}_user()') fail to re-enable PAN if they > encounter an unhandled fault whilst accessing userspace. > > For CPUs implementing both hardware PAN and UAO, this bug has no effect > when both extensions are in use by the kernel. > > For CPUs implementing hardware PAN but not UAO, this means that a kernel > using hardware PAN may execute portions of code with PAN inadvertently > disabled, opening us up to potential security vulnerabilities that rely > on userspace access from within the kernel which would usually be > prevented by this mechanism. In other words, parts of the kernel run the > same way as they would on a CPU without PAN implemented/emulated at all. > > For CPUs not implementing hardware PAN and instead relying on software > emulation via 'CONFIG_ARM64_SW_TTBR0_PAN=y', the impact is unfortunately > much worse. Calling 'schedule()' with software PAN disabled means that > the next task will execute in the kernel using the page-table and ASID > of the previous process even after 'switch_mm()', since the actual > hardware switch is deferred until return to userspace. At this point, or > if there is a intermediate call to 'uaccess_enable()', the page-table > and ASID of the new process are installed. Sadly, due to the changes > introduced by KPTI, this is not an atomic operation and there is a very > small window (two instructions) where the CPU is configured with the > page-table of the old task and the ASID of the new task; a speculative > access in this state is disastrous because it would corrupt the TLB > entries for the new task with mappings from the previous address space. > > As Pavel explains: > > | I was able to reproduce memory corruption problem on Broadcom's SoC > | ARMv8-A like this: > | > | Enable software perf-events with PERF_SAMPLE_CALLCHAIN so userland's > | stack is accessed and copied. > | > | The test program performed the following on every CPU and forking > | many processes: > | > | unsigned long *map = mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE, > | MAP_SHARED | MAP_ANONYMOUS, -1, 0); > | map[0] = getpid(); > | sched_yield(); > | if (map[0] != getpid()) { > | fprintf(stderr, "Corruption detected!"); > | } > | munmap(map, PAGE_SIZE); > | > | From time to time I was getting map[0] to contain pid for a > | different process. > > Ensure that PAN is re-enabled when returning after an unhandled user > fault from our uaccess routines. > > Cc: Catalin Marinas <catalin.marinas@xxxxxxx> > Reviewed-by: Mark Rutland <mark.rutland@xxxxxxx> > Tested-by: Mark Rutland <mark.rutland@xxxxxxx> > Cc: <stable@xxxxxxxxxxxxxxx> > Fixes: 338d4f49d6f7 ("arm64: kernel: Add support for Privileged Access Never") > Signed-off-by: Pavel Tatashin <pasha.tatashin@xxxxxxxxxx> > [will: rewrote commit message] > [will: backport for 4.9.y stable kernels] Thanks for this and the 4.4.y backport, both now queued up. greg k-h