Re: Intermittent guest kernel crashes with v4.5-rc6.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Marc,

Thanks for your quick reply.

On 03/02/2016 08:16 AM, Marc Zyngier wrote:
On 02/03/16 13:56, Shanker Donthineni wrote:
For some reason v4.5-rc6 kernel is not stable for guest machines on
Qualcomm server platforms.
We are getting IABT translation faults while booting the guest kernel.
The problem disappears with
the following code snippet (insert "dsb ish" instruction just before
switching to EL1 guest). I am
using v4.5-rc6 kernel for both host and guest machines.

Please let me know if you have any thoughts or ideas for tracing this
problem.

--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -88,6 +88,7 @@ ENTRY(__guest_enter)
          ldp     x0, x1, [sp], #16

          // Do not touch any register after this!
+       dsb ish
          eret
   ENDPROC(__guest_enter)


Using below QEMU command for launching guest machine:

qemu-system-aarch64 -machine type=virt,accel=kvm,gic-version=3  \
-cpu "host" -smp cpus=1,maxcpus=1 -m 256M -serial stdio \
-kernel /boot/Image -initrd /boot/rootfs.cpio.gz \
-append 'earlycon=earlycon=pl011,0x09000000  \
console=ttyAMA0,115200 root=/dev/ram'


Guest machine crash log messages:

[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Boot CPU: AArch64 Processor [510f2811]
[    0.000000] Bad mode in Synchronous Abort handler detected, code
0x8600000f -- IABT (current EL)
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.5.rc6+
[    0.000000] task: ffffffc000d52200 ti: ffffffc000d44000 task.ti:
ffffffc000d44000
[    0.000000] PC is at early_init_dt_scan_root+0x28/0x94
[    0.000000] LR is at of_scan_flat_dt+0x9c/0xd0
[    0.000000] pc : [<ffffffc000cb32e8>] lr : [<ffffffc000cb3248>]
pstate: 800003c5
[    0.000000] sp : ffffffc000d47e80
[    0.000000] x29: ffffffc000d47e80 x28: 0000000000000000

If you're getting a prefetch abort, it would be interesting to find out
what instruction is there, whether the page is mapped at stage-2 or not,
what are the stage-2 permissions... Basically, a full description of the
memory state.

Also, does it work if you do a "dsb ishst" instead?

Thanks,

	M.

Most of the times it is faulting at ldr/str instructions. I have verified stage-1 page and the the corresponding stage-2 page attributes (SH, AP, PERM), PA etc. after IABT, everything perfectly matches. I am very confident that stage-1/stage-2 MMU page tables are correct.

Instruction "dsb ishst" also fixing the problem.

One more Interesting observation, if retry an instruction fetch that caused IABT, second time fetch is successful and I don't see IABT. I used below experimental code to test.

--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -346,6 +346,7 @@ el1_sync:
        b.eq    el1_undef
        cmp     x24, #ESR_ELx_EC_BREAKPT_CUR    // debug exception in EL1
        b.ge    el1_dbg
+       kernel_exit 1
        b       el1_inv
 el1_da:


--
Shanker Donthineni
Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm



[Index of Archives]     [Linux KVM]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux