Bugs item #1984384, was opened at 2008-06-04 14:49 Message generated for change (Comment added) made by avik You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1984384&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Pending >Resolution: Fixed Priority: 5 Private: No Submitted By: Rafal Wijata (ravpl) Assigned to: Nobody/Anonymous (nobody) Summary: soft lockup - CPU#5 stuck for 11s! [qemu-system-x86:4966] Initial Comment: I'm using kvm-69 running on Linux 2.6.24.7-92.fc8 #1 SMP Wed May 7 16:26:02 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux kvm modules loaded from kvm-69 rather than kernel provided My system almost freezed after I killed qemu process. I saw many, many tasks in 'D' state, along with [reiserfs/?] tasks. Normally I would consider it reiserfs bug(and maybe it is), but two things - it happened after qemu process was killed(running with 6cpus, 6G memory, 16G hdd placed on reiserfs placed on 200M/s hdd) - dmesg showed following messages(2 total), which suggest it stucked in kvm BUG: soft lockup - CPU#5 stuck for 11s! [qemu-system-x86:4966] CPU 5: Modules linked in: ipt_REJECT nf_conntrack_ipv4 iptable_filter ip_tables kvm_intel(U) kvm(U) tun nfs lockd nfs_acl autofs4 coretemp hwmon fuse sunrpc bridge xt_tcpudp nf_conntrack_ipv6 xt_state nf_conntrack ip6t_REJECT ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand acpi_cpufreq reiserfs ext2 dm_mirror dm_multipath dm_mod i5000_edac iTCO_wdt serio_raw pcspkr iTCO_vendor_support e1000 button edac_core i2c_i801 ata_piix i2c_core pata_acpi ata_generic sg usb_storage ahci libata shpchp 3w_9xxx sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd Pid: 4966, comm: qemu-system-x86 Not tainted 2.6.24.7-92.fc8 #1 RIP: 0010:[<ffffffff8834b29e>] [<ffffffff8834b29e>] :kvm:rmap_remove+0x170/0x198 RSP: 0018:ffff8101f4df5bd8 EFLAGS: 00000246 RAX: 0000000000000002 RBX: ffff81004294af60 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000106 RDI: ffff8101770448c0 RBP: ffff8101ce0454d0 R08: ffffc20001b86030 R09: ffff8101d3587118 R10: 000000000019e7ea R11: ffff8101394dd9c0 R12: ffff8100240cece0 R13: 0000000000000000 R14: 000000000019e7ea R15: 0000000000000018 FS: 0000000000000000(0000) GS:ffff81021f049580(0000) knlGS:0000000000000000 CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b CR2: 00000000f7ff6000 CR3: 000000021b5e5000 CR4: 00000000000026e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: [<ffffffff8834b1dd>] :kvm:rmap_remove+0xaf/0x198 [<ffffffff8834b372>] :kvm:kvm_mmu_zap_page+0x8a/0x25e [<ffffffff8834b9f3>] :kvm:free_mmu_pages+0x12/0x34 [<ffffffff8834bac9>] :kvm:kvm_mmu_destroy+0x1d/0x5e [<ffffffff88346979>] :kvm:kvm_arch_vcpu_uninit+0x1d/0x38 [<ffffffff8834555b>] :kvm:kvm_vcpu_uninit+0x9/0x15 [<ffffffff88163aa8>] :kvm_intel:vmx_free_vcpu+0x74/0x84 [<ffffffff8834657b>] :kvm:kvm_arch_destroy_vm+0x69/0xb4 [<ffffffff88345538>] :kvm:kvm_vcpu_release+0x13/0x18 [<ffffffff810a35d4>] __fput+0xc2/0x18f [<ffffffff810a0de7>] filp_close+0x5d/0x65 [<ffffffff8103b3df>] put_files_struct+0x66/0xc4 [<ffffffff8103c6f7>] do_exit+0x28c/0x76b [<ffffffff8103cc55>] sys_exit_group+0x0/0xe [<ffffffff81044163>] get_signal_to_deliver+0x3aa/0x3d8 [<ffffffff8100b359>] do_notify_resume+0xa8/0x732 [<ffffffff8126b7f6>] unlock_kernel+0x32/0x33 [<ffffffff881c01db>] :reiserfs:reiserfs_setattr+0x26e/0x27d [<ffffffff810a1866>] do_truncate+0x70/0x79 [<ffffffff8100bf17>] sysret_signal+0x1c/0x27 [<ffffffff8100c1a7>] ptregscall_common+0x67/0xb0 ---------------------------------------------------------------------- >Comment By: Avi Kivity (avik) Date: 2009-08-10 15:27 Message: Should be fixed in git. ---------------------------------------------------------------------- Comment By: Avi Kivity (avik) Date: 2008-06-04 18:45 Message: Logged In: YES user_id=539971 Originator: NO Okay, I added a cond_resched() in free_mmu_pages(). That should avoid the softlockup tick. File Added: prevent-softlockup-on-kvm-destroy.patch ---------------------------------------------------------------------- Comment By: david ahern (dsahern) Date: 2008-06-04 18:08 Message: Logged In: YES user_id=1755596 Originator: NO My host did not crash, only the guest. I actually was not aware it had gone down until I went to login. At that point I went digging through syslog to find out when it died (my control scripts log startup and shutdown). The host has not been rebooted, and I have not seen any problems starting guests. The guest that terminated has 2 cpus and 2GB of RAM and runs RHEL3 as the OS. The host has 6 GB of RAM. ---------------------------------------------------------------------- Comment By: Rafal Wijata (ravpl) Date: 2008-06-04 17:58 Message: Logged In: YES user_id=996150 Originator: YES In my case it recovered after a while ~2-3 minutes ---------------------------------------------------------------------- Comment By: Avi Kivity (avik) Date: 2008-06-04 17:33 Message: Logged In: YES user_id=539971 Originator: NO Did the system recover later? David, how much memory did you assign to the guest? ---------------------------------------------------------------------- Comment By: david ahern (dsahern) Date: 2008-06-04 17:20 Message: Logged In: YES user_id=1755596 Originator: NO I hit this issue yesterday as well. Host is running 2.6.26-rc3 from kvm.git, per-page-pte-tracking branch. At the time a VM had been up and running for ~24 hours, and I was installing another VM. The guest that had been running for ~24 hours terminated abruptly. [4776654.043860] BUG: soft lockup - CPU#0 stuck for 94s! [ksoftirqd/0:4] [4776654.043860] CPU 0: [4776654.043860] Modules linked in: tun bridge llc iptable_filter ip_tables x_tables kvm_intel kvm usbhid ahci ata_piix libata ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd usbcore [4776654.043860] Pid: 4, comm: ksoftirqd/0 Not tainted 2.6.26-rc3-00969-g7cce43a #1 [4776654.043860] RIP: 0010:[<ffffffff812cf205>] [<ffffffff812cf205>] _spin_unlock_irq+0xc/0x2a [4776654.043860] RSP: 0018:ffffffff81510f38 EFLAGS: 00000202 [4776654.043860] RAX: ffffffff81510f48 RBX: ffffffff81510f38 RCX: ffffffff811d8fad [4776654.043860] RDX: ffffffff81510f48 RSI: 000000000000561a RDI: 0000000000000001 [4776654.043860] RBP: ffffffff81510eb0 R08: ffff8101a61c3d58 R09: 0000000000000000 [4776654.043860] R10: 000000008147fe40 R11: ffff8101a786c648 R12: ffffffff8100cc46 [4776654.043860] R13: ffffffff81510eb0 R14: ffffffff8156e080 R15: ffff8101a61c3d20 [4776654.043860] FS: 0000000000000000(0000) GS:ffffffff8148a000(0000) knlGS:0000000000000000 [4776654.043860] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [4776654.043860] CR2: 0000000003ca8fb0 CR3: 0000000112592000 CR4: 00000000000026e0 [4776654.043860] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [4776654.043860] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [4776654.043860] [4776654.043860] Call Trace: [4776654.043860] <IRQ> [<ffffffff81038ba6>] ? run_timer_softirq+0x163/0x1f1 [4776654.043860] [<ffffffff81035097>] ? __do_softirq+0x4b/0xc5 [4776654.043860] [<ffffffff810354a2>] ? ksoftirqd+0x0/0x123 [4776654.043860] [<ffffffff8100d19c>] ? call_softirq+0x1c/0x28 [4776654.043860] <EOI> [<ffffffff8100ea88>] ? do_softirq+0x34/0x72 [4776654.043860] [<ffffffff81035506>] ? ksoftirqd+0x64/0x123 [4776654.043860] [<ffffffff81042c83>] ? kthread+0x49/0x76 [4776654.043860] [<ffffffff8100ce28>] ? child_rip+0xa/0x12 [4776654.043860] [<ffffffff81042c3a>] ? kthread+0x0/0x76 [4776654.043860] [<ffffffff8100ce1e>] ? child_rip+0x0/0x12 [4776654.043860] [4776654.043860] BUG: soft lockup - CPU#1 stuck for 93s! [ksoftirqd/1:7] [4776654.043860] CPU 1: [4776654.043860] Modules linked in: tun bridge llc iptable_filter ip_tables x_tables kvm_intel kvm usbhid ahci ata_piix libata ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd usbcore [4776654.043860] Pid: 7, comm: ksoftirqd/1 Not tainted 2.6.26-rc3-00969-g7cce43a #1 [4776654.043860] RIP: 0010:[<ffffffff812cf205>] [<ffffffff812cf205>] _spin_unlock_irq+0xc/0x2a [4776654.043860] RSP: 0018:ffff8101a789bf38 EFLAGS: 00000206 [4776654.043860] RAX: ffff8101a789bf48 RBX: ffff8101a789bf38 RCX: ffffffff81038e1c [4776654.043860] RDX: ffff8101a789bf48 RSI: 0000000000000bef RDI: 0000000000000001 [4776654.043860] RBP: ffff8101a789beb0 R08: ffff8101a64d5b60 R09: 0000000000000000 [4776654.043860] R10: ffff8101a60c1500 R11: ffff8101a50daa48 R12: ffffffff8100cc46 [4776654.043860] R13: ffff8101a789beb0 R14: ffff8101a788c000 R15: ffff8101a64d5b28 [4776654.043860] FS: 0000000000000000(0000) GS:ffff8101a7805580(0000) knlGS:0000000000000000 [4776654.043860] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [4776654.043860] CR2: 0000000000e65228 CR3: 000000007d4bd000 CR4: 00000000000026e0 [4776654.043860] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [4776654.043860] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [4776654.043860] [4776654.043860] Call Trace: [4776654.043860] <IRQ> [<ffffffff81038ba6>] ? run_timer_softirq+0x163/0x1f1 [4776654.043860] [<ffffffff81035097>] ? __do_softirq+0x4b/0xc5 [4776654.043860] [<ffffffff810354a2>] ? ksoftirqd+0x0/0x123 [4776654.043860] [<ffffffff8100d19c>] ? call_softirq+0x1c/0x28 [4776654.043860] <EOI> [<ffffffff8100ea88>] ? do_softirq+0x34/0x72 [4776654.043860] [<ffffffff81035506>] ? ksoftirqd+0x64/0x123 [4776654.043860] [<ffffffff81042c83>] ? kthread+0x49/0x76 [4776654.043860] [<ffffffff8100ce28>] ? child_rip+0xa/0x12 [4776654.043860] [<ffffffff81042c3a>] ? kthread+0x0/0x76 [4776654.043860] [<ffffffff8100ce1e>] ? child_rip+0x0/0x12 [4776654.043860] ---------------------------------------------------------------------- Comment By: Avi Kivity (avik) Date: 2008-06-04 15:19 Message: Logged In: YES user_id=539971 Originator: NO It's a kvm bug; kvm is spending too much time tearing down the page tables. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1984384&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html