[ kvm-Bugs-1984384 ] soft lockup - CPU#5 stuck for 11s! [qemu-system-x86:4966]

"SourceForge.net" <noreply@xxxxxxxxxxxxxxx> · Tue, 25 Aug 2009 02:20:29 +0000

Bugs item #1984384, was opened at 2008-06-04 11:49
Message generated for change (Comment added) made by sf-robot
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1984384&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
Resolution: Fixed
Priority: 5
Private: No
Submitted By: Rafal Wijata (ravpl)
Assigned to: Nobody/Anonymous (nobody)
Summary: soft lockup - CPU#5 stuck for 11s! [qemu-system-x86:4966]

Initial Comment:
I'm using kvm-69 running on
Linux 2.6.24.7-92.fc8 #1 SMP Wed May 7 16:26:02 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
kvm modules loaded from kvm-69 rather than kernel provided

My system almost freezed after I killed qemu process.
I saw many, many tasks in 'D' state, along with [reiserfs/?] tasks.
Normally I would consider it reiserfs bug(and maybe it is), but two things
- it happened after qemu process was killed(running with 6cpus, 6G memory, 16G hdd placed on reiserfs placed on 200M/s hdd)

- dmesg showed following messages(2 total), which suggest it stucked in kvm

BUG: soft lockup - CPU#5 stuck for 11s! [qemu-system-x86:4966]
CPU 5:
Modules linked in: ipt_REJECT nf_conntrack_ipv4 iptable_filter ip_tables kvm_intel(U) kvm(U) tun nfs lockd nfs_acl autofs4 coretemp hwmon fuse sunrpc bridge xt_tcpudp nf_conntrack_ipv6 xt_state nf_conntrack ip6t_REJECT ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand acpi_cpufreq reiserfs ext2 dm_mirror dm_multipath dm_mod i5000_edac iTCO_wdt serio_raw pcspkr iTCO_vendor_support e1000 button edac_core i2c_i801 ata_piix i2c_core pata_acpi ata_generic sg usb_storage ahci libata shpchp 3w_9xxx sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd
Pid: 4966, comm: qemu-system-x86 Not tainted 2.6.24.7-92.fc8 #1
RIP: 0010:[<ffffffff8834b29e>]  [<ffffffff8834b29e>] :kvm:rmap_remove+0x170/0x198
RSP: 0018:ffff8101f4df5bd8  EFLAGS: 00000246
RAX: 0000000000000002 RBX: ffff81004294af60 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000106 RDI: ffff8101770448c0
RBP: ffff8101ce0454d0 R08: ffffc20001b86030 R09: ffff8101d3587118
R10: 000000000019e7ea R11: ffff8101394dd9c0 R12: ffff8100240cece0
R13: 0000000000000000 R14: 000000000019e7ea R15: 0000000000000018
FS:  0000000000000000(0000) GS:ffff81021f049580(0000) knlGS:0000000000000000
CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 00000000f7ff6000 CR3: 000000021b5e5000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400

Call Trace:
 [<ffffffff8834b1dd>] :kvm:rmap_remove+0xaf/0x198
 [<ffffffff8834b372>] :kvm:kvm_mmu_zap_page+0x8a/0x25e
 [<ffffffff8834b9f3>] :kvm:free_mmu_pages+0x12/0x34
 [<ffffffff8834bac9>] :kvm:kvm_mmu_destroy+0x1d/0x5e
 [<ffffffff88346979>] :kvm:kvm_arch_vcpu_uninit+0x1d/0x38
 [<ffffffff8834555b>] :kvm:kvm_vcpu_uninit+0x9/0x15
 [<ffffffff88163aa8>] :kvm_intel:vmx_free_vcpu+0x74/0x84
 [<ffffffff8834657b>] :kvm:kvm_arch_destroy_vm+0x69/0xb4
 [<ffffffff88345538>] :kvm:kvm_vcpu_release+0x13/0x18
 [<ffffffff810a35d4>] __fput+0xc2/0x18f
 [<ffffffff810a0de7>] filp_close+0x5d/0x65
 [<ffffffff8103b3df>] put_files_struct+0x66/0xc4
 [<ffffffff8103c6f7>] do_exit+0x28c/0x76b
 [<ffffffff8103cc55>] sys_exit_group+0x0/0xe
 [<ffffffff81044163>] get_signal_to_deliver+0x3aa/0x3d8
 [<ffffffff8100b359>] do_notify_resume+0xa8/0x732
 [<ffffffff8126b7f6>] unlock_kernel+0x32/0x33
 [<ffffffff881c01db>] :reiserfs:reiserfs_setattr+0x26e/0x27d
 [<ffffffff810a1866>] do_truncate+0x70/0x79
 [<ffffffff8100bf17>] sysret_signal+0x1c/0x27
 [<ffffffff8100c1a7>] ptregscall_common+0x67/0xb0

----------------------------------------------------------------------

>Comment By: SourceForge Robot (sf-robot)
Date: 2009-08-25 02:20

Message:
This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 14 days (the time period specified by
the administrator of this Tracker).

----------------------------------------------------------------------

Comment By: Avi Kivity (avik)
Date: 2009-08-10 12:27

Message:
Should be fixed in git.

----------------------------------------------------------------------

Comment By: Avi Kivity (avik)
Date: 2008-06-04 15:45

Message:
Logged In: YES 
user_id=539971
Originator: NO

Okay,  I added a cond_resched() in free_mmu_pages().  That should avoid
the softlockup tick.
File Added: prevent-softlockup-on-kvm-destroy.patch

----------------------------------------------------------------------

Comment By: david ahern (dsahern)
Date: 2008-06-04 15:08

Message:
Logged In: YES 
user_id=1755596
Originator: NO

My host did not crash, only the guest. I actually was not aware it had
gone down until I went to login. At that point I went digging through
syslog to find out when it died (my control scripts log startup and
shutdown). The host has not been rebooted, and I have not seen any problems
starting guests.

The guest that terminated has 2 cpus and 2GB of RAM and runs RHEL3 as the
OS. The host has 6 GB of RAM. 

----------------------------------------------------------------------

Comment By: Rafal Wijata (ravpl)
Date: 2008-06-04 14:58

Message:
Logged In: YES 
user_id=996150
Originator: YES

In my case it recovered after a while ~2-3 minutes

----------------------------------------------------------------------

Comment By: Avi Kivity (avik)
Date: 2008-06-04 14:33

Message:
Logged In: YES 
user_id=539971
Originator: NO

Did the system recover later?

David, how much memory did you assign to the guest?

----------------------------------------------------------------------

Comment By: david ahern (dsahern)
Date: 2008-06-04 14:20

Message:
Logged In: YES 
user_id=1755596
Originator: NO

I hit this issue yesterday as well. Host is running 2.6.26-rc3 from
kvm.git, 
per-page-pte-tracking branch. At the time a VM had been up and running for
~24 hours, and I was installing another VM. The guest that had been running
for ~24 hours terminated abruptly.

[4776654.043860] BUG: soft lockup - CPU#0 stuck for 94s! [ksoftirqd/0:4]
[4776654.043860] CPU 0:
[4776654.043860] Modules linked in: tun bridge llc iptable_filter
ip_tables x_tables kvm_intel kvm usbhid ahci ata_piix libata ext3 jbd
mbcache uhci_hcd ohci_hcd ehci_hcd usbcore
[4776654.043860] Pid: 4, comm: ksoftirqd/0 Not tainted
2.6.26-rc3-00969-g7cce43a #1
[4776654.043860] RIP: 0010:[<ffffffff812cf205>]  [<ffffffff812cf205>]
_spin_unlock_irq+0xc/0x2a
[4776654.043860] RSP: 0018:ffffffff81510f38  EFLAGS: 00000202
[4776654.043860] RAX: ffffffff81510f48 RBX: ffffffff81510f38 RCX:
ffffffff811d8fad
[4776654.043860] RDX: ffffffff81510f48 RSI: 000000000000561a RDI:
0000000000000001
[4776654.043860] RBP: ffffffff81510eb0 R08: ffff8101a61c3d58 R09:
0000000000000000
[4776654.043860] R10: 000000008147fe40 R11: ffff8101a786c648 R12:
ffffffff8100cc46
[4776654.043860] R13: ffffffff81510eb0 R14: ffffffff8156e080 R15:
ffff8101a61c3d20
[4776654.043860] FS:  0000000000000000(0000) GS:ffffffff8148a000(0000)
knlGS:0000000000000000
[4776654.043860] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[4776654.043860] CR2: 0000000003ca8fb0 CR3: 0000000112592000 CR4:
00000000000026e0
[4776654.043860] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[4776654.043860] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[4776654.043860] 
[4776654.043860] Call Trace:
[4776654.043860]  <IRQ>  [<ffffffff81038ba6>] ?
run_timer_softirq+0x163/0x1f1
[4776654.043860]  [<ffffffff81035097>] ? __do_softirq+0x4b/0xc5
[4776654.043860]  [<ffffffff810354a2>] ? ksoftirqd+0x0/0x123
[4776654.043860]  [<ffffffff8100d19c>] ? call_softirq+0x1c/0x28
[4776654.043860]  <EOI>  [<ffffffff8100ea88>] ? do_softirq+0x34/0x72
[4776654.043860]  [<ffffffff81035506>] ? ksoftirqd+0x64/0x123
[4776654.043860]  [<ffffffff81042c83>] ? kthread+0x49/0x76
[4776654.043860]  [<ffffffff8100ce28>] ? child_rip+0xa/0x12
[4776654.043860]  [<ffffffff81042c3a>] ? kthread+0x0/0x76
[4776654.043860]  [<ffffffff8100ce1e>] ? child_rip+0x0/0x12
[4776654.043860] 
[4776654.043860] BUG: soft lockup - CPU#1 stuck for 93s! [ksoftirqd/1:7]
[4776654.043860] CPU 1:
[4776654.043860] Modules linked in: tun bridge llc iptable_filter
ip_tables x_tables kvm_intel kvm usbhid ahci ata_piix libata ext3 jbd
mbcache uhci_hcd ohci_hcd ehci_hcd usbcore
[4776654.043860] Pid: 7, comm: ksoftirqd/1 Not tainted
2.6.26-rc3-00969-g7cce43a #1
[4776654.043860] RIP: 0010:[<ffffffff812cf205>]  [<ffffffff812cf205>]
_spin_unlock_irq+0xc/0x2a
[4776654.043860] RSP: 0018:ffff8101a789bf38  EFLAGS: 00000206
[4776654.043860] RAX: ffff8101a789bf48 RBX: ffff8101a789bf38 RCX:
ffffffff81038e1c
[4776654.043860] RDX: ffff8101a789bf48 RSI: 0000000000000bef RDI:
0000000000000001
[4776654.043860] RBP: ffff8101a789beb0 R08: ffff8101a64d5b60 R09:
0000000000000000
[4776654.043860] R10: ffff8101a60c1500 R11: ffff8101a50daa48 R12:
ffffffff8100cc46
[4776654.043860] R13: ffff8101a789beb0 R14: ffff8101a788c000 R15:
ffff8101a64d5b28
[4776654.043860] FS:  0000000000000000(0000) GS:ffff8101a7805580(0000)
knlGS:0000000000000000
[4776654.043860] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[4776654.043860] CR2: 0000000000e65228 CR3: 000000007d4bd000 CR4:
00000000000026e0
[4776654.043860] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[4776654.043860] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[4776654.043860] 
[4776654.043860] Call Trace:
[4776654.043860]  <IRQ>  [<ffffffff81038ba6>] ?
run_timer_softirq+0x163/0x1f1
[4776654.043860]  [<ffffffff81035097>] ? __do_softirq+0x4b/0xc5
[4776654.043860]  [<ffffffff810354a2>] ? ksoftirqd+0x0/0x123
[4776654.043860]  [<ffffffff8100d19c>] ? call_softirq+0x1c/0x28
[4776654.043860]  <EOI>  [<ffffffff8100ea88>] ? do_softirq+0x34/0x72
[4776654.043860]  [<ffffffff81035506>] ? ksoftirqd+0x64/0x123
[4776654.043860]  [<ffffffff81042c83>] ? kthread+0x49/0x76
[4776654.043860]  [<ffffffff8100ce28>] ? child_rip+0xa/0x12
[4776654.043860]  [<ffffffff81042c3a>] ? kthread+0x0/0x76
[4776654.043860]  [<ffffffff8100ce1e>] ? child_rip+0x0/0x12
[4776654.043860] 

----------------------------------------------------------------------

Comment By: Avi Kivity (avik)
Date: 2008-06-04 12:19

Message:
Logged In: YES 
user_id=539971
Originator: NO

It's a kvm bug; kvm is spending too much time tearing down the page
tables.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1984384&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html