Re: Intermittent guest kernel crashes with v4.5-rc6.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02/03/16 14:59, Shanker Donthineni wrote:
> Hi Marc,
> 
> Thanks for your quick reply.
> 
> On 03/02/2016 08:16 AM, Marc Zyngier wrote:
>> On 02/03/16 13:56, Shanker Donthineni wrote:
>>> For some reason v4.5-rc6 kernel is not stable for guest machines on
>>> Qualcomm server platforms.
>>> We are getting IABT translation faults while booting the guest kernel.
>>> The problem disappears with
>>> the following code snippet (insert "dsb ish" instruction just before
>>> switching to EL1 guest). I am
>>> using v4.5-rc6 kernel for both host and guest machines.
>>>
>>> Please let me know if you have any thoughts or ideas for tracing this
>>> problem.
>>>
>>> --- a/arch/arm64/kvm/hyp/entry.S
>>> +++ b/arch/arm64/kvm/hyp/entry.S
>>> @@ -88,6 +88,7 @@ ENTRY(__guest_enter)
>>>           ldp     x0, x1, [sp], #16
>>>
>>>           // Do not touch any register after this!
>>> +       dsb ish
>>>           eret
>>>    ENDPROC(__guest_enter)
>>>
>>>
>>> Using below QEMU command for launching guest machine:
>>>
>>> qemu-system-aarch64 -machine type=virt,accel=kvm,gic-version=3  \
>>> -cpu "host" -smp cpus=1,maxcpus=1 -m 256M -serial stdio \
>>> -kernel /boot/Image -initrd /boot/rootfs.cpio.gz \
>>> -append 'earlycon=earlycon=pl011,0x09000000  \
>>> console=ttyAMA0,115200 root=/dev/ram'
>>>
>>>
>>> Guest machine crash log messages:
>>>
>>> [    0.000000] Booting Linux on physical CPU 0x0
>>> [    0.000000] Boot CPU: AArch64 Processor [510f2811]
>>> [    0.000000] Bad mode in Synchronous Abort handler detected, code
>>> 0x8600000f -- IABT (current EL)
>>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.5.rc6+
>>> [    0.000000] task: ffffffc000d52200 ti: ffffffc000d44000 task.ti:
>>> ffffffc000d44000
>>> [    0.000000] PC is at early_init_dt_scan_root+0x28/0x94
>>> [    0.000000] LR is at of_scan_flat_dt+0x9c/0xd0
>>> [    0.000000] pc : [<ffffffc000cb32e8>] lr : [<ffffffc000cb3248>]
>>> pstate: 800003c5
>>> [    0.000000] sp : ffffffc000d47e80
>>> [    0.000000] x29: ffffffc000d47e80 x28: 0000000000000000
>>>
>> If you're getting a prefetch abort, it would be interesting to find out
>> what instruction is there, whether the page is mapped at stage-2 or not,
>> what are the stage-2 permissions... Basically, a full description of the
>> memory state.
>>
>> Also, does it work if you do a "dsb ishst" instead?
>>
>> Thanks,
>>
>> 	M.
> 
> Most of the times it is faulting at ldr/str instructions. I have 
> verified stage-1 page and  the
> the corresponding stage-2 page attributes (SH, AP, PERM), PA etc. after 
> IABT, everything
> perfectly matches. I am very confident that stage-1/stage-2 MMU page 
> tables are correct.
> 
> Instruction "dsb ishst" also fixing the problem.
> 
> One more Interesting observation, if retry an instruction fetch that 
> caused IABT, second
> time fetch is successful and I don't see IABT.  I used below 
> experimental code to test.
> 
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -346,6 +346,7 @@ el1_sync:
>          b.eq    el1_undef
>          cmp     x24, #ESR_ELx_EC_BREAKPT_CUR    // debug exception in EL1
>          b.ge    el1_dbg
> +       kernel_exit 1
>          b       el1_inv
>   el1_da:
> 
> 

OK, that's pretty scary, specially considering that we don't have a DSB
on that path. Do you ever see it exploding at EL0?

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm



[Index of Archives]     [Linux KVM]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux