On 02/03/16 14:59, Shanker Donthineni wrote: > Hi Marc, > > Thanks for your quick reply. > > On 03/02/2016 08:16 AM, Marc Zyngier wrote: >> On 02/03/16 13:56, Shanker Donthineni wrote: >>> For some reason v4.5-rc6 kernel is not stable for guest machines on >>> Qualcomm server platforms. >>> We are getting IABT translation faults while booting the guest kernel. >>> The problem disappears with >>> the following code snippet (insert "dsb ish" instruction just before >>> switching to EL1 guest). I am >>> using v4.5-rc6 kernel for both host and guest machines. >>> >>> Please let me know if you have any thoughts or ideas for tracing this >>> problem. >>> >>> --- a/arch/arm64/kvm/hyp/entry.S >>> +++ b/arch/arm64/kvm/hyp/entry.S >>> @@ -88,6 +88,7 @@ ENTRY(__guest_enter) >>> ldp x0, x1, [sp], #16 >>> >>> // Do not touch any register after this! >>> + dsb ish >>> eret >>> ENDPROC(__guest_enter) >>> >>> >>> Using below QEMU command for launching guest machine: >>> >>> qemu-system-aarch64 -machine type=virt,accel=kvm,gic-version=3 \ >>> -cpu "host" -smp cpus=1,maxcpus=1 -m 256M -serial stdio \ >>> -kernel /boot/Image -initrd /boot/rootfs.cpio.gz \ >>> -append 'earlycon=earlycon=pl011,0x09000000 \ >>> console=ttyAMA0,115200 root=/dev/ram' >>> >>> >>> Guest machine crash log messages: >>> >>> [ 0.000000] Booting Linux on physical CPU 0x0 >>> [ 0.000000] Boot CPU: AArch64 Processor [510f2811] >>> [ 0.000000] Bad mode in Synchronous Abort handler detected, code >>> 0x8600000f -- IABT (current EL) >>> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.5.rc6+ >>> [ 0.000000] task: ffffffc000d52200 ti: ffffffc000d44000 task.ti: >>> ffffffc000d44000 >>> [ 0.000000] PC is at early_init_dt_scan_root+0x28/0x94 >>> [ 0.000000] LR is at of_scan_flat_dt+0x9c/0xd0 >>> [ 0.000000] pc : [<ffffffc000cb32e8>] lr : [<ffffffc000cb3248>] >>> pstate: 800003c5 >>> [ 0.000000] sp : ffffffc000d47e80 >>> [ 0.000000] x29: ffffffc000d47e80 x28: 0000000000000000 >>> >> If you're getting a prefetch abort, it would be interesting to find out >> what instruction is there, whether the page is mapped at stage-2 or not, >> what are the stage-2 permissions... Basically, a full description of the >> memory state. >> >> Also, does it work if you do a "dsb ishst" instead? >> >> Thanks, >> >> M. > > Most of the times it is faulting at ldr/str instructions. I have > verified stage-1 page and the > the corresponding stage-2 page attributes (SH, AP, PERM), PA etc. after > IABT, everything > perfectly matches. I am very confident that stage-1/stage-2 MMU page > tables are correct. > > Instruction "dsb ishst" also fixing the problem. > > One more Interesting observation, if retry an instruction fetch that > caused IABT, second > time fetch is successful and I don't see IABT. I used below > experimental code to test. > > --- a/arch/arm64/kernel/entry.S > +++ b/arch/arm64/kernel/entry.S > @@ -346,6 +346,7 @@ el1_sync: > b.eq el1_undef > cmp x24, #ESR_ELx_EC_BREAKPT_CUR // debug exception in EL1 > b.ge el1_dbg > + kernel_exit 1 > b el1_inv > el1_da: > > OK, that's pretty scary, specially considering that we don't have a DSB on that path. Do you ever see it exploding at EL0? Thanks, M. -- Jazz is not dead. It just smells funny... _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm