[Bug 55201] New: host panic when "creating guest, doing scp and killing QEMU process" continuously

bugzilla-daemon@xxxxxxxxxxxxxxxxxxx · Thu, 14 Mar 2013 02:54:11 +0000 (UTC)

https://bugzilla.kernel.org/show_bug.cgi?id=55201

           Summary: host panic when "creating guest, doing scp and killing
                    QEMU process" continuously
           Product: Virtualization
           Version: unspecified
    Kernel Version: 3.7.0
          Platform: All
        OS/Version: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: kvm
        AssignedTo: virtualization_kvm@xxxxxxxxxxxxxxxxxxxx
        ReportedBy: yongjie.ren@xxxxxxxxx
        Regression: No

Environment:
------------
Host OS (ia32/ia32e/IA64):ia32e
Guest OS (ia32/ia32e/IA64):ia32e
Guest OS Type (Linux/Windows):Linux
kvm.git (next branch) Commit:3ab66e8a455a4877889c65a848f2fb32be502f2c
qemu-kvm (uq/master) Commit:3e41a753551a906dd9ed66fb0fc34167a6af3ba0
Host Kernel Version:3.7.0
Hardware: SandyBridge-EP server system

Bug detailed description:
--------------------------
host panic when "creating guest, doing scp and killing QEMU process"
continuously. I met this issue when doing the loop for the 268th time or about
300th time.
I think the is a regression in kernel side because we found the following
result.

kvm.git(next branch)  +  qemu-kvm(uq/master branch)  =  result 
3ab66e8a              +  3e41a7535                   =  Bad
45e3cc7d              +  3e41a7535                   =  Good 
For the good case above, at least it can't be reproduced in my side when doing
the loop for more than 1000 times.

Reproduce steps:
----------------
1. qemu-system-x86_64 -m 1024 -smp 4 -net nic,macaddr=00:12:23:43:53:12 -net
tap,script=/etc/kvm/qemu-ifup -hda rhel6u3.qcow --enable-kvm &
2. create a file in the guest
3. scp the file to the host 
4. kill the QEMU process of the guest
5. repeat step 1 to step 4 again and again

Current result:
----------------
guest hang, and host panic

Expected result:
----------------
guest and host work fine

Basic root-causing log: (host serial port log)
----------------
NMI backtrace for cpu 25
CPU 25
Pid: 0, comm: swapper/25 Tainted: P             3.7.0 #2 Intel Corporation
S2600CP/S2600CP
RIP: 0010:[<ffffffff8124f0cd>]  [<ffffffff8124f0cd>] intel_idle+0x9e/0xc2
RSP: 0018:ffff88042f1ade08  EFLAGS: 00000046
RAX: 0000000000000030 RBX: 0000000000000010 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffff88042f1adfd8 RDI: 0000000000000019
RBP: ffff88042f1ade38 R08: 0000000000000000 R09: 000000000000006d
R10: 0000000000000003 R11: ffff88083f3324c0 R12: 0000000000000004
R13: 0000000000000030 R14: 0000000000000004 R15: 0000000000000019
FS:  0000000000000000(0000) GS:ffff88083f320000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffb9886d0a0 CR3: 0000000001a0b000 CR4: 00000000000427e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper/25 (pid: 0, threadinfo ffff88042f1ac000, task ffff88042f181040)
Stack:
 000000192f1adfd8 000000193f32d740 ffff88042f1ade48 ffff88083f339170
 000010290c6bf2d2 ffffffff81332c2c ffff88042f1ade48 ffffffff81332c3e
 ffff88042f1adea8 ffffffff813330f3 ffff880400000004 ffffffff81a44110
Call Trace:
 [<ffffffff81332c2c>] ? disable_cpuidle+0x10/0x10
 [<ffffffff81332c3e>] cpuidle_enter+0x12/0x14
 [<ffffffff813330f3>] cpuidle_wrap_enter+0x2f/0x6d
 [<ffffffff81333141>] cpuidle_enter_tk+0x10/0x12
 [<ffffffff81332c52>] cpuidle_enter_state+0x12/0x3a
 [<ffffffff8133332b>] cpuidle_idle_call+0x12a/0x1df
 [<ffffffff8100916a>] cpu_idle+0x5e/0xa4
 [<ffffffff813f76a6>] start_secondary+0x75/0x77
Code: ff 48 8d 86 38 e0 ff ff 80 e2 08 75 1e 31 d2 48 89 d1 0f 01 c8 0f ae f0
48 8b 86 38 e0 ff ff a8 08 75 08 b1 01 4c 89 e
BUG: soft lockup - CPU#6 stuck for 22s! [qemu-system-x86:57320]
Modules linked in: ext3 jbd vfat fat loop tun kvm_intel nfsv3 nfs_acl nfsv4
auth_rpcgss nfs fscache dns_resolver lockd pci_s]CPU 6
Pid: 57320, comm: qemu-system-x86 Tainted: P             3.7.0 #2 Intel
Corporation S2600CP/S2600CP
RIP: 0010:[<ffffffff810797ce>]  [<ffffffff810797ce>]
smp_call_function_many+0x1cc/0x1dd
RSP: 0018:ffff88082c9a5ce8  EFLAGS: 00000202
RAX: 00000000000000ff RBX: ffffffff81a82f50 RCX: 0000000000000001
RDX: 7fffffffffffffff RSI: 00000000000000ff RDI: 000000000000003f
RBP: ffff88082c9a5d28 R08: 00000000000000c0 R09: ffff88043f6cce08
R10: 0000000000000004 R11: 0000000000000003 R12: 0000000000000296
R13: ffff88083f20cdd0 R14: 000800002d240040 R15: 0000000000000006
FS:  00007f28ffca9840(0000) GS:ffff88043f6c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f28fef664d0 CR3: 000000082cd06000 CR4: 00000000000427e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process qemu-system-x86 (pid: 57320, threadinfo ffff88082c9a4000, task
ffff88082d240040)
Stack:
 01007f28ee5fcfff ffff88082c9a5d38 ffff88082c9a5d28BUG: soft lockup - CPU#15
stuck for 22s! [python:57408]
Modules linked in: ext3 jbd vfat fat loop tun kvm_intel nfsv3 nfs_acl nfsv4
auth_rpcgss nfs fscache dns_resolver lockd pci_s]CPU 15 Pid: 57408, comm:
python Tainted: P             3.7.0 #2 Intel Corporation S2600CP/S2600CP
RIP: 0010:[<ffffffff810797ce>]  [<ffffffff810797ce>]
smp_call_function_many+0x1cc/0x1dd
RSP: 0018:ffff8808290ab958  EFLAGS: 00000202
RAX: 00000000000000ff RBX: ffffffff81a82f50 RCX: 0000000000000001
RDX: 7fffffffffffffff RSI: 00000000000000ff RDI: 000000000000003f
RBP: ffff8808290ab998 R08: 00000000000000c0 R09: ffff88083f2ece08
R10: 0000000000000004 R11: 0000000000000003 R12: 0000000000000292
R13: ffff88083f20cdd0 R14: 000c00003f2f24c0 R15: 000000000000000f
FS:  00007f7bab739700(0000) GS:ffff88083f2e0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000a13c98 CR3: 000000082c0e7000 CR4: 00000000000427e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process python (pid: 57408, threadinfo ffff8808290aa000, task ffff88082cb83800)
Stack:
 0100000000000001 0000000000000000 ffff88082e60d200 ffffffff8102c8a2
 0000000000000000 0000000000000000 ffffffff81a2f1c0 000000000000ae05
 ffff8808290ab9a8 ffffffff810798d4 ffff8808290ab9e8 ffffffff8107990c
Call Trace:
 [<ffffffff8102c8a2>] ? leave_mm+0x43/0x43
 [<ffffffff810798d4>] smp_call_function+0x1d/0x21
 [<ffffffff8107990c>] on_each_cpu+0x18/0x35
 [<ffffffff8102c7c6>] flush_tlb_kernel_range+0x63/0x65
 [<ffffffff810ee3df>] __purge_vmap_area_lazy+0x138/0x195
 [<ffffffff810f0658>] vm_unmap_aliases+0x15f/0x16e
 [<ffffffff8102aad9>] change_page_attr_set_clr+0xf4/0x365
 [<ffffffff810c9985>] ? __alloc_pages_nodemask+0x183/0x81c
 [<ffffffff8102afb6>] _set_memory_wb+0x2a/0x2c
 [<ffffffff81029cde>] ioremap_change_attr+0x26/0x28
 [<ffffffff8102ba6b>] kernel_map_sync_memtype+0x69/0xb7
 [<ffffffff8102bb7c>] reserve_pfn_range+0xc3/0xdf
 [<ffffffff812a2a39>] ? memory_open+0x66/0x6f
 [<ffffffff8102bbc9>] track_pfn_remap+0x31/0x45
 [<ffffffff810e2432>] remap_pfn_range+0x80/0x36c
 [<ffffffff810ccfe1>] ? lru_cache_add_lru+0x25/0x27
 [<ffffffff810ec684>] ? page_add_new_anon_rmap+0xc8/0xda
 [<ffffffff812a305b>] mmap_mem+0x75/0x87
 [<ffffffff810ea454>] mmap_region+0x2ba/0x4dc
 [<ffffffff81005cef>] ? arch_get_unmapped_area_topdown+0x1cb/0x1ff
 [<ffffffff810ea8be>] do_mmap_pgoff+0x248/0x2a6
 [<ffffffff810d9253>] vm_mmap_pgoff+0x6c/0x8b
 [<ffffffff810e8198>] sys_mmap_pgoff+0xe1/0x114
 [<ffffffff81005d40>] sys_mmap+0x1d/0x21
 [<ffffffff81407192>] system_call_fastpath+0x16/0x1b
Code: 63 28 4c 89 ee 48 c7 c7 d0 41 a0 81 e8 50 6f 38 00 0f ae f0 4c 89 f7 ff
15 58 2b 9a 00 80 7d c7 00 75 04 eb 08 f3 90 f

 ffff8808244ac140
 ffff8808244ac400 00007f28ee4d0000 00007f28ee5fd000 ffff88082d1a0088
 ffff88082c9a5d58 ffffffff8102c80a ffff8808244ac140 0000000000000000
Call Trace:
 [<ffffffff8102c80a>] native_flush_tlb_others+0x29/0x2b
 [<ffffffff8102cb15>] flush_tlb_mm_range+0x1b2/0x1bb
 [<ffffffff810e306e>] tlb_flush_mmu+0x3f/0x7b
 [<ffffffff810e30c1>] tlb_finish_mmu+0x17/0x3c
 [<ffffffff810e7d45>] unmap_region+0xcf/0xe1
 [<ffffffff811411f4>] ? eventfd_write+0x8f/0x17b
 [<ffffffff810e9c4a>] do_munmap+0x2a6/0x332
 [<ffffffff810e9d16>] vm_munmap+0x40/0x5b
 [<ffffffff810e9d52>] sys_munmap+0x21/0x2a
 [<ffffffff81407192>] system_call_fastpath+0x16/0x1b
Code: 63 28 4c 89 ee 48 c7 c7 d0 41 a0 81 e8 50 6f 38 00 0f ae f0 4c 89 f7 ff
15 58 2b 9a 00 80 7d c7 00 75 04 eb 08 f3 90 f
BUG: soft lockup - CPU#4 stuck for 24s! [qemu-system-x86:57338]
Modules linked in: ext3 jbd vfat fat loop tun kvm_intel nfsv3 nfs_acl nfsv4
auth_rpcgss nfs fscache dns_resolver lockd pci_s]CPU 4
Pid: 57338, comm: qemu-system-x86 Tainted: P             3.7.0 #2 Intel
Corporation S2600CP/S2600CP
RIP: 0010:[<ffffffff81079459>]  [<ffffffff81079459>]
generic_exec_single+0x7f/0x90
RSP: 0018:ffff88082c8e9bc8  EFLAGS: 00000202
RAX: 00000000000000ff RBX: 0000000000000010 RCX: 0000000000000001
RDX: 7fffffffffffffff RSI: 00000000000000ff RDI: 000000000000003f
RBP: ffff88082c8e9c08 R08: 00000000000000c0 R09: ffff88043f68ce08
R10: 0000000000000004 R11: ffff88082abf8900 R12: 0000000000000292
R13: ffff88082c8e9b98 R14: 0000000000000002 R15: ffff88043f68cdf0
FS:  00007f28effff700(0000) GS:ffff88043f680000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f28ffcb400d CR3: 000000082cd06000 CR4: 00000000000427e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process qemu-system-x86 (pid: 57338, threadinfo ffff88082c8e8000, task
ffff88082481e040)
Stack:
 0000000000000286 ffff88043f653000 ffff88043f008040 0000000000000002
 ffffffffa011b086 0000000000000001 0000000000000000 0000000000000004
 ffff88082c8e9c78 ffffffff810795ea ffff88043f653000 ffff88043f653000
Call Trace:
 [<ffffffffa011b086>] ? kvm_is_mmio_pfn+0x86/0x86 [kvm]
 [<ffffffff810795ea>] smp_call_function_single+0xdc/0xf4
 [<ffffffffa011b086>] ? kvm_is_mmio_pfn+0x86/0x86 [kvm]
 [<ffffffffa011b086>] ? kvm_is_mmio_pfn+0x86/0x86 [kvm]
 [<ffffffff810796e6>] smp_call_function_many+0xe4/0x1dd
 [<ffffffffa01200d3>] make_all_cpus_request+0xaf/0xba [kvm]
 [<ffffffffa01200fc>] kvm_make_mclock_inprogress_request+0xe/0x10 [kvm]
 [<ffffffffa012e71a>] vcpu_enter_guest+0x99/0x64f [kvm]
 [<ffffffffa033a107>] ? update_exception_bitmap+0x6b/0x6d [kvm_intel]
 [<ffffffffa03410f5>] ? vmx_vcpu_reset+0x370/0x3e4 [kvm_intel]
 [<ffffffffa012eda0>] __vcpu_run+0xd0/0x279 [kvm]
 [<ffffffffa0132836>] kvm_arch_vcpu_ioctl_run+0xe7/0x1a4 [kvm]
 [<ffffffffa012047e>] kvm_vcpu_ioctl+0x121/0x4e2 [kvm]
 [<ffffffff81076137>] ? wake_futex+0x57/0x6f
 [<ffffffff8107623a>] ? futex_wake+0xeb/0xfd
 [<ffffffff811170de>] do_vfs_ioctl+0x255/0x271
 [<ffffffff81078395>] ? sys_futex+0x10b/0x145
 [<ffffffff81117153>] sys_ioctl+0x59/0x7d
 [<ffffffff81407192>] system_call_fastpath+0x16/0x1b
Code: 45 c0 4c 89 f7 48 89 c6 e8 c9 72 38 00 48 39 5d c8 75 09 44 89 ef ff 15
d6 2e 9a 00 45 85 ff 75 04 eb 0a f3 90 41 f6 4
Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 2
Pid: 57336, comm: qemu-system-x86 Tainted: P             3.7.0 #2
Call Trace:
 <NMI>  [<ffffffff813fe1ac>] panic+0xb9/0x1cf
 [<ffffffff810980c0>] watchdog_overflow_callback+0x7c/0xa1
 [<ffffffff810bf4d7>] __perf_event_overflow+0x137/0x1c1
 [<ffffffff810b96c1>] ? perf_event_update_userpage+0x19/0xe7
 [<ffffffff810bfb62>] perf_event_overflow+0x14/0x16
 [<ffffffff81013f86>] intel_pmu_handle_irq+0x253/0x2c9
 [<ffffffff8100424d>] ? show_regs+0x1fa/0x209
 [<ffffffff81401eb0>] perf_event_nmi_handler+0x19/0x1b
 [<ffffffff814017d6>] nmi_handle+0x48/0x6c
 [<ffffffff814018cc>] default_do_nmi+0x4d/0x1c2
 [<ffffffff81401aac>] do_nmi+0x6b/0xb1
 [<ffffffff81400ff7>] end_repeat_nmi+0x1e/0x2e
 [<ffffffff814006bc>] ? _raw_spin_lock+0x1c/0x20
 [<ffffffff814006bc>] ? _raw_spin_lock+0x1c/0x20
 [<ffffffff814006bc>] ? _raw_spin_lock+0x1c/0x20
 <<EOE>>  [<ffffffffa012e291>] kvm_guest_time_update+0x55/0x30d [kvm]
 [<ffffffffa012e793>] vcpu_enter_guest+0x112/0x64f [kvm]
 [<ffffffffa03410f5>] ? vmx_vcpu_reset+0x370/0x3e4 [kvm_intel]
 [<ffffffffa012eda0>] __vcpu_run+0xd0/0x279 [kvm]
 [<ffffffffa0132836>] kvm_arch_vcpu_ioctl_run+0xe7/0x1a4 [kvm]
 [<ffffffffa012047e>] kvm_vcpu_ioctl+0x121/0x4e2 [kvm]
 [<ffffffff81076137>] ? wake_futex+0x57/0x6f
 [<ffffffff8107623a>] ? futex_wake+0xeb/0xfd
 [<ffffffff811170de>] do_vfs_ioctl+0x255/0x271
 [<ffffffff81078395>] ? sys_futex+0x10b/0x145
 [<ffffffff81117153>] sys_ioctl+0x59/0x7d
 [<ffffffff81407192>] system_call_fastpath+0x16/0x1b
Shutting down cpus with NMI

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html