Re: Kernel boot regression with PAuth and aarch64-softmmu -cpu max and el2 enabled

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

[adding Kristina, who is in charge of Linux pointer authentication]

On Tue, Jan 29, 2019 at 11:08:19AM +0000, Alex Bennée wrote:
> Hi,
> 
> Following up on yesterday's discussion on IRC I thought I'd better
> report on my findings in the permanent record so things don't get lost.
> 
> As I tend to periodically rebuild my test kernels from the current
> state of linux.git I occasionally run into these things. My test
> invocation is:
> 
>   qemu-system-aarch64 -machine type=virt,virtualization=on \
>                       -display none -m 4096 -serial mon:stdio \
>                       -kernel ../../kernel-v8-plain.build/arch/arm64/boot/Image \
>                       -append 'console=ttyAMA0 panic=-1' -no-reboot -cpu max
>
> The kernel is essentially a defconfig kernel with a bunch of the VIRTIO
> device drivers built-in for when I actually boot a more complex setup
> with disks and drives. However this is a boot test so doesn't really
> matter.
> 
> The -machine type=virt,virtualization=on enables our virt machine model
> with EL2 turned on. As there is no BIOS involved the kernel is invoked
> directly at EL2.
> 
> The -cpu max enabled a cortex-a57 + whatever extra features we've
> enabled in QEMU so far. It won't match any "real" CPU but it should be
> architecturally correct in so far we implement prerequisite features for
> any given feature. The cpuid feature bits should also be correct as we
> test them internally in QEMU to enable features.

Just to check, does this enable VHE?

> The breakage is the kernel never boots (no output on serial port) and on
> attaching with gdb I found it stuck in:
> 
>   (gdb) bt
>   #0  0xffffff8010a9e480 in overflow_stack ()
>   Backtrace stopped: not enough registers or memory available to unwind further
> 
> If I turn on exception tracing it looks like we go into an exception
> loop.

As mentioned on IRC, this looks very odd, since overflow_stack is a data
pointer, not code. I can't presently see how we could branch here.

If you pass the kernel 'earlycon keep_bootcon', do you get any output?

> On the QEMU side this breakage comes in at:
> 
>   commit 1ce32e47db52e3511132c7104770eae65d412144 (HEAD, refs/bisect/bad)
>   Author: Richard Henderson <richard.henderson@xxxxxxxxxx>
>   Date:   Mon Jan 21 10:23:13 2019 +0000
> 
>       target/arm: Enable PAuth for -cpu max
> 
>       Reviewed-by: Peter Maydell <peter.maydell@xxxxxxxxxx>
>       Signed-off-by: Richard Henderson <richard.henderson@xxxxxxxxxx>
>       Message-id: 20190108223129.5570-30-richard.henderson@xxxxxxxxxx
>       Signed-off-by: Peter Maydell <peter.maydell@xxxxxxxxxx>
> 
> and as you would expect the system boots fine with -cpu cortex-a57
> 
> On the kernel side it breaks at:
> 
>   commit 04ca3204fa09f5f55c8f113b0072004a7b364ff4
>   Author: Mark Rutland <mark.rutland@xxxxxxx>
>   Date:   Fri Dec 7 18:39:30 2018 +0000
> 
>       arm64: enable pointer authentication
> 
>       Now that all the necessary bits are in place for userspace, add the
>       necessary Kconfig logic to allow this to be enabled.
> 
>       Signed-off-by: Mark Rutland <mark.rutland@xxxxxxx>
>       Signed-off-by: Kristina Martsenko <kristina.martsenko@xxxxxxx>
>       Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
>       Cc: Will Deacon <will.deacon@xxxxxxx>
>       Signed-off-by: Will Deacon <will.deacon@xxxxxxx>
> 
> So predictably we failed at enabling PAuth somewhere between the kernel
> and QEMU.
> 
> I'm guessing the kernel so far has been tested on the fast model with a
> full chain of TF, UEFI and kernel?

The kernel has been tested on a fast model with the Linux bootwrapper:

https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/

Kristina, could you confirm whether or not it's been tested with
ATF+UEFI?

> I think Richard's tests were without EL2 enabled.
> 
> So in the case that the kernel boots in EL2 is it expecting anyone else
> to deal with Pauth exceptions or should it be able to cope with an
> enabled Pauth but no firmware underneath it?

So long as the highest implemented exception level is EL2, the kernel
should handle that itself. During boot we'll configure HCR_EL2.{API,APK}
in el2_setup().

>From that point onwards, there should be no traps for pointer
authentication functionality from EL1, AFAICT.

> Either we've got something wrong or we'll need to rethink what features
> the user can have enabled by -cpu max on a direct kernel boot.

It's not immediately clear to me when precisely things are going wrong,
so I think we need to narrow that down first. For example, it's not
clear whether a trap is being taken, or something is unexpectedly
behaving is UNDEF.

Is it possible to watch the exception vectors to see if/when an
exception is taken, and from where?

Thanks,
Mark.
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm




[Index of Archives]     [Linux KVM]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux