Hi, [adding Kristina, who is in charge of Linux pointer authentication] On Tue, Jan 29, 2019 at 11:08:19AM +0000, Alex Bennée wrote: > Hi, > > Following up on yesterday's discussion on IRC I thought I'd better > report on my findings in the permanent record so things don't get lost. > > As I tend to periodically rebuild my test kernels from the current > state of linux.git I occasionally run into these things. My test > invocation is: > > qemu-system-aarch64 -machine type=virt,virtualization=on \ > -display none -m 4096 -serial mon:stdio \ > -kernel ../../kernel-v8-plain.build/arch/arm64/boot/Image \ > -append 'console=ttyAMA0 panic=-1' -no-reboot -cpu max > > The kernel is essentially a defconfig kernel with a bunch of the VIRTIO > device drivers built-in for when I actually boot a more complex setup > with disks and drives. However this is a boot test so doesn't really > matter. > > The -machine type=virt,virtualization=on enables our virt machine model > with EL2 turned on. As there is no BIOS involved the kernel is invoked > directly at EL2. > > The -cpu max enabled a cortex-a57 + whatever extra features we've > enabled in QEMU so far. It won't match any "real" CPU but it should be > architecturally correct in so far we implement prerequisite features for > any given feature. The cpuid feature bits should also be correct as we > test them internally in QEMU to enable features. Just to check, does this enable VHE? > The breakage is the kernel never boots (no output on serial port) and on > attaching with gdb I found it stuck in: > > (gdb) bt > #0 0xffffff8010a9e480 in overflow_stack () > Backtrace stopped: not enough registers or memory available to unwind further > > If I turn on exception tracing it looks like we go into an exception > loop. As mentioned on IRC, this looks very odd, since overflow_stack is a data pointer, not code. I can't presently see how we could branch here. If you pass the kernel 'earlycon keep_bootcon', do you get any output? > On the QEMU side this breakage comes in at: > > commit 1ce32e47db52e3511132c7104770eae65d412144 (HEAD, refs/bisect/bad) > Author: Richard Henderson <richard.henderson@xxxxxxxxxx> > Date: Mon Jan 21 10:23:13 2019 +0000 > > target/arm: Enable PAuth for -cpu max > > Reviewed-by: Peter Maydell <peter.maydell@xxxxxxxxxx> > Signed-off-by: Richard Henderson <richard.henderson@xxxxxxxxxx> > Message-id: 20190108223129.5570-30-richard.henderson@xxxxxxxxxx > Signed-off-by: Peter Maydell <peter.maydell@xxxxxxxxxx> > > and as you would expect the system boots fine with -cpu cortex-a57 > > On the kernel side it breaks at: > > commit 04ca3204fa09f5f55c8f113b0072004a7b364ff4 > Author: Mark Rutland <mark.rutland@xxxxxxx> > Date: Fri Dec 7 18:39:30 2018 +0000 > > arm64: enable pointer authentication > > Now that all the necessary bits are in place for userspace, add the > necessary Kconfig logic to allow this to be enabled. > > Signed-off-by: Mark Rutland <mark.rutland@xxxxxxx> > Signed-off-by: Kristina Martsenko <kristina.martsenko@xxxxxxx> > Cc: Catalin Marinas <catalin.marinas@xxxxxxx> > Cc: Will Deacon <will.deacon@xxxxxxx> > Signed-off-by: Will Deacon <will.deacon@xxxxxxx> > > So predictably we failed at enabling PAuth somewhere between the kernel > and QEMU. > > I'm guessing the kernel so far has been tested on the fast model with a > full chain of TF, UEFI and kernel? The kernel has been tested on a fast model with the Linux bootwrapper: https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/ Kristina, could you confirm whether or not it's been tested with ATF+UEFI? > I think Richard's tests were without EL2 enabled. > > So in the case that the kernel boots in EL2 is it expecting anyone else > to deal with Pauth exceptions or should it be able to cope with an > enabled Pauth but no firmware underneath it? So long as the highest implemented exception level is EL2, the kernel should handle that itself. During boot we'll configure HCR_EL2.{API,APK} in el2_setup(). >From that point onwards, there should be no traps for pointer authentication functionality from EL1, AFAICT. > Either we've got something wrong or we'll need to rethink what features > the user can have enabled by -cpu max on a direct kernel boot. It's not immediately clear to me when precisely things are going wrong, so I think we need to narrow that down first. For example, it's not clear whether a trap is being taken, or something is unexpectedly behaving is UNDEF. Is it possible to watch the exception vectors to see if/when an exception is taken, and from where? Thanks, Mark. _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm