Re: CPU hotplug with KVM(ARM)

Giridhar Maruthy <giridhar.maruthy@xxxxxxxxxx> · Fri, 19 Apr 2013 10:08:50 +0530

Hi Marc and Christoffer,

Below are the steps I took and the complete crash dump.
1. start the host with all cpus in hyp mode.
2. start the guest os.
3. offline and hotplug the all of the secondary cpus.
4. verify that the guest os is still alive and start one more guest os.
5. halt the first guest os.
6. quit qemu process. The crash happens now.

[  123.700000] Unable to handle kernel NULL pointer dereference at
virtual address 00000000
[  123.700000] pgd = c0003000
[  123.700000] [00000000] *pgd=80000080004003, *pmd=00000000
[  123.710000] Internal error: Oops: 207 [#1] PREEMPT SMP ARM
[  123.710000] CPU: 1    Not tainted  (3.8.0-rc7-00196-g063f56c-dirty #269)
[  123.720000] PC is at unmap_range+0x9c/0x2f4
[  123.720000] LR is at kvm_free_stage2_pgd+0x30/0x4c
[  123.730000] pc : [<c00145b0>]    lr : [<c0014c2c>]    psr: 80000013
[  123.730000] sp : eeb53e60  ip : 00000000  fp : ee80c000
[  123.740000] r10: ee40e808  r9 : 00000000  r8 : 00000000
[  123.750000] r7 : ae1db003  r6 : c0000000  r5 : ee80c000  r4 : 00000000
[  123.750000] r3 : 00000000  r2 : ae1db003  r1 : 00000000  r0 : 00000000
[  123.760000] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  123.770000] Control: 30c5387d  Table: ae9a5400  DAC: 55555555
[  123.770000] Process qemu-system-arm (pid: 2678, stack limit = 0xeeb52238)
[  123.780000] Stack: (0xeeb53e60 to 0xeeb54000)
[  123.780000] 3e60: eeb53e84 00040003 c051b5c8 c051b5c0 00000001
000c0000 c0000000 00000000
[  123.790000] 3e80: c0afc7e0 00000000 00000100 ee40e800 00000000
ee9a5e00 00000002 ee40e808
[  123.800000] 3ea0: 00000001 c0014c2c 00000000 00000100 ee40e800
ee8df500 eef63c78 c00129c8
[  123.810000] 3ec0: ee40e800 ee8df500 eef63c78 c000eb6c ee706780
eec4a330 eef63c78 00000000
[  123.820000] 3ee0: 00000008 ef2c5310 ee706788 c000f068 c000f058
c00bebcc 00000000 00000000
[  123.820000] 3f00: ef336854 ef2ad000 ef336580 c0513644 c0019b28
eeb52000 00000000 c0044810
[  123.830000] 3f20: ef336580 26212621 ee8df500 ef336580 ef336864
ee8df500 ee8df548 c0030cb8
[  123.840000] 3f40: 00000001 ef336580 eeb52000 00000000 eeb53f64
26212621 ef3670c0 ef367218
[  123.850000] 3f60: 00000001 ee6ab600 00000000 eeb52000 ee5f94c4
c0019b28 eeb52000 00000000
[  123.860000] 3f80: 00000001 c003132c 00000000 000703c2 b6d56760
b6d56760 000000f8 c00313a4
[  123.860000] 3fa0: 00000000 c0019980 000703c2 b6d56760 00000000
000703ae b6c3f4c0 00000000
[  123.870000] 3fc0: 000703c2 b6d56760 b6d56760 000000f8 00251804
00000001 be9773f9 00000001
[  123.880000] 3fe0: 000000f8 be97734c b6ce7ce3 b6c8f1e6 600f0030
00000000 ffffffff ffffffff
[  123.890000] [<c00145b0>] (unmap_range+0x9c/0x2f4) from [<c0014c2c>]
(kvm_free_stage2_pgd+0x30/0x4c)
[  123.900000] [<c0014c2c>] (kvm_free_stage2_pgd+0x30/0x4c) from
[<c00129c8>] (kvm_arch_destroy_vm+0xc/0x38)
[  123.910000] [<c00129c8>] (kvm_arch_destroy_vm+0xc/0x38) from
[<c000eb6c>] (kvm_put_kvm+0xec/0x150)
[  123.920000] [<c000eb6c>] (kvm_put_kvm+0xec/0x150) from [<c000f068>]
(kvm_vcpu_release+0x10/0x18)
[  123.930000] [<c000f068>] (kvm_vcpu_release+0x10/0x18) from
[<c00bebcc>] (__fput+0x88/0x1dc)
[  123.930000] [<c00bebcc>] (__fput+0x88/0x1dc) from [<c0044810>]
(task_work_run+0xac/0xe8)
[  123.940000] [<c0044810>] (task_work_run+0xac/0xe8) from
[<c0030cb8>] (do_exit+0x22c/0x82c)
[  123.950000] [<c0030cb8>] (do_exit+0x22c/0x82c) from [<c003132c>]
(do_group_exit+0x48/0xb0)
[  123.960000] [<c003132c>] (do_group_exit+0x48/0xb0) from
[<c00313a4>] (__wake_up_parent+0x0/0x18)
[  123.970000] Code: e1927003 0afffff0 e7e80658 e3a0c000 (e1cc20d0)
[  123.970000] ---[ end trace 8f0d0eaefb305781 ]---
[  123.980000] Fixing recursive fault but reboot is needed!

Thanks,
Giridhar

On 04/19/2013 12:08 AM, Christoffer Dall wrote:

On Thu, Apr 18, 2013 at 7:40 AM, Marc Zyngier <marc.zyngier@xxxxxxx> wrote:

On 18/04/13 15:16, Giridhar Maruthy wrote:
Hi Giridhar,

Thanks a lot for pointing me at the series.
I did apply the series and got cpu hotplug to work successfully.

Ah, good to know. Thanks for testing.

However, I have the following doubts.
1. Though the guest does not crash, when exiting the qemu, I get the
following crash dump. I have not yet looked into the details.

I haven't been able to reproduce this. Can you tell us the exact steps
you take to reproduce?

[  547.870000] [<c00145b0>] (unmap_range+0x9c/0x2f4) from [<c0014c2c>]
(kvm_free_stage2_pgd+0x30/0x4c)
[  547.880000] [<c0014c2c>] (kvm_free_stage2_pgd+0x30/0x4c) from
[<c00129c8>] (kvm_arch_destroy_vm+0xc/0x38)
[  547.890000] [<c00129c8>] (kvm_arch_destroy_vm+0xc/0x38) from
[<c000eb6c>] (kvm_put_kvm+0xec/0x150)
[  547.900000] [<c000eb6c>] (kvm_put_kvm+0xec/0x150) from [<c000f068>]
(kvm_vcpu_release+0x10/0x18)
[  547.910000] [<c000f068>] (kvm_vcpu_release+0x10/0x18) from
[<c00bebcc>] (__fput+0x88/0x1dc)
[  547.920000] [<c00bebcc>] (__fput+0x88/0x1dc) from [<c0044810>]
(task_work_run+0xac/0xe8)
[  547.920000] [<c0044810>] (task_work_run+0xac/0xe8) from [<c0030cb8>]
(do_exit+0x22c/0x82c)
[  547.930000] [<c0030cb8>] (do_exit+0x22c/0x82c) from [<c003132c>]
(do_group_exit+0x48/0xb0)
[  547.940000] [<c003132c>] (do_group_exit+0x48/0xb0) from [<c003b618>]
(get_signal_to_deliver+0x278/0x504)
[  547.950000] [<c003b618>] (get_signal_to_deliver+0x278/0x504) from
[<c001c8e4>] (do_signal+0x74/0x460)
[  547.960000] [<c001c8e4>] (do_signal+0x74/0x460) from [<c001d150>]
(do_work_pending+0x64/0xac)
[  547.970000] [<c001d150>] (do_work_pending+0x64/0xac) from
[<c00199c0>] (work_pending+0xc/0x20)
[  547.980000] Code: e1927003 0afffff0 e7e80658 e3a0c000 (e1cc20d0)
[  547.980000] ---[ end trace 05d3020cd57fa289 ]---
[  547.990000] Fixing recursive fault but reboot is needed!

It probably means we're having issues with the Stage-2 page refcounts.
Can you share the whole dump (I think there's a few additional lines
before what you quoted)?

2. I applied kvm-arm-fixes branch from Christoffer's tree
(github.com/virtualopensystems/linux-kvm-arm
<http://github.com/virtualopensystems/linux-kvm-arm>) and then applied
the v4 series of "ARM: KVM: Revamping the HYP init code for fun and
profit". I ran into some merge conflicts. So I manually edited and
applied the patches. Should I be including any more dependant patches?

You'd be better of using the following branch:
git://github.com/columbia/linux-kvm-arm.git kvm-arm-for-next
as it should contain all you need. I haven't tested it yet, though.

so I just tried this on vexpress TC2, and when I hotplug cpu1, I get
the crash below. Is this actually supposed to work at this point?:
Kernel panic - not syncing: unexpected prefetch abort in Hyp mode at:
0x803c1880unexpected data abort in Hyp mode at: 0x0
[<800208f4>] (unwind_backtrace+0x0/0xf8) from [<803bb360>] (panic+0x90/0x1e4)
[<803bb360>] (panic+0x90/0x1e4) from [<80012b48>] (cpu_init_hyp_mode+0x10/0x6c)
[<80012b48>] (cpu_init_hyp_mode+0x10/0x6c) from [<80012bc8>]
(hyp_init_cpu_notify+0x24/0x2c)
[<80012bc8>] (hyp_init_cpu_notify+0x24/0x2c) from [<8004b900>]
(notifier_call_chain+0x44/0x84)
[<8004b900>] (notifier_call_chain+0x44/0x84) from [<8002ebf8>]
(__cpu_notify+0x28/0x44)
[<8002ebf8>] (__cpu_notify+0x28/0x44) from [<803b8d20>]
(secondary_start_kernel+0xd4/0x11c)
[<803b8d20>] (secondary_start_kernel+0xd4/0x11c) from [<803b6dec>]
(vexpress_cpu_die+0xc/0xa0)
CPU0: stopping
[<800208f4>] (unwind_backtrace+0x0/0xf8) from [<8001f078>]
(handle_IPI+0xfc/0x130)
[<8001f078>] (handle_IPI+0xfc/0x130) from [<800085c4>]
(gic_handle_irq+0x54/0x5c)
[<800085c4>] (gic_handle_irq+0x54/0x5c) from [<80019f00>] (__irq_svc+0x40/0x50)
Exception stack(0x8052bf60 to 0x8052bfa8)
bf60: 0000001f 805323ec 00000000 00000000 8052a000 80554948 8052a000 80554948
bf80: 8052a000 412fc0f1 803c4a2c 00000000 00000000 8052bfa8 8001b584 8001b564
bfa0: 600f0013 ffffffff
[<80019f00>] (__irq_svc+0x40/0x50) from [<8001b564>] (cpu_idle+0xa0/0xec)
[<8001b564>] (cpu_idle+0xa0/0xec) from [<804f67ac>] (start_kernel+0x29c/0x2ec)
-Christoffer
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm