Hello, one of our VMs regularly get stuck: the VM is completely unresponsive (no ssh, no serial console, no VNC). Using "gdbserver" and a remote system to debug the running VM, I see 3 CPUs (1,3,4) stuck in pgd_alloc() → spin_lock_irqsave(pgd_lock) while the 4th CPU (2) is waiting in pgd_alloc() → pgd_prepopulate_pmb() →... → flush_tlb_others_ipi() 195 while (!cpumask_empty(to_cpumask(f->flush_cpumask))) 196 cpu_relax(); (gdb) print f->flush_cpumask $5 = {1} CPU 1 is duing a do_exec() syscall, will CPU 2-4 are doing a do_fork() syscall according to "thread apply all backtrace". After a "set variable f->flush_cpumask 0" from gdb the kernel continued dumping the trace-informations, which I attached. Host: Debian linux-2.6.32-38-amd64 (=2.6.32.42), 8 Cores Kvm: 0.14.1+dfsg Guest: Debian linux-2.6.32-38-i686-bigmem, 4 CPUs Is this a known bug and/or is a fix available? I can gather more information from the VM if needd. Sincerely Philipp -- Philipp Hahn Open Source Software Engineer hahn@xxxxxxxxxxxxx Univention GmbH Linux for Your Business fon: +49 421 22 232- 0 Mary-Somerville-Str.1 D-28359 Bremen fax: +49 421 22 232-99 http://www.univention.de/
[261850.404057] BUG: soft lockup - CPU#1 stuck for 219010s! [sshd:2707] [261850.404057] Modules linked in: binfmt_misc nfsd exportfs xt_tcpudp nf_conntrack_ipv6 ip6table_mangle ip6table_filter ip6_tables xt_state iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables nfs lockd fscache nfs_acl auth_rpcgss sunrpc quota_v2 quota_tree sd_mod crc_t10dif ide_generic ide_gd_mod ide_core cirrusfb 8139cp mii snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 psmouse i2c_core i6300esb virtio_balloon joydev serio_raw pcspkr processor evdev usbhid hid ext3 jbd dm_snapshot dm_mirror dm_region_hash dm_log dm_mod uhci_hcd ata_generic virtio_blk virtio_net ehci_hcd ata_piix thermal usbcore nls_base virtio_pci floppy thermal_sys libata button [last unloaded: scsi_wait_scan] [261850.404057] [261850.404057] Pid: 2707, comm: sshd Not tainted (2.6.32-ucs52-686-bigmem #1) Bochs [261850.404057] EIP: 0060:[<c1293e1d>] EFLAGS: 00000292 CPU: 1 [261850.404057] EIP is at _spin_unlock_irqrestore+0x9/0xf [261850.404057] EAX: 00000292 EBX: 00000003 ECX: 00000800 EDX: 00000292 [261850.404057] ESI: 00000000 EDI: 00000001 EBP: f55f3eb4 ESP: f55f3e64 [261850.404057] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [261850.404057] CR0: 8005003b CR2: b72a1000 CR3: 35607000 CR4: 000006f0 [261850.404057] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [261850.404057] DR6: ffff0ff0 DR7: 00000400 [261850.404057] Call Trace: [261850.404057] [<c102422e>] ? pgd_alloc+0x1c7/0x21a [261850.404057] [<c1034908>] mm_init+0xa8/0xd4 [261850.404057] [<c1034dd1>] ? dup_mm+0x6a/0x389 [261850.404057] [<c10356bf>] ? copy_process+0x57a/0xf28 [261850.404057] [<c1035a60>] ? copy_process+0x91b/0xf28 [261850.404057] [<c10361a7>] ? do_fork+0x13a/0x2bc [261850.404057] [<c1006e6e>] ? sys_clone+0x21/0x27 [261850.404057] [<c100829c>] ? syscall_call+0x7/0xb [261850.404058] BUG: soft lockup - CPU#2 stuck for 218991s! [runsv:1236] [261850.404058] Modules linked in: binfmt_misc nfsd exportfs xt_tcpudp nf_conntrack_ipv6 ip6table_mangle ip6table_filter ip6_tables xt_state iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables nfs lockd fscache nfs_acl auth_rpcgss sunrpc quota_v2 quota_tree sd_mod crc_t10dif ide_generic ide_gd_mod ide_core cirrusfb 8139cp mii snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 psmouse i2c_core i6300esb virtio_balloon joydev serio_raw pcspkr processor evdev usbhid hid ext3 jbd dm_snapshot dm_mirror dm_region_hash dm_log dm_mod uhci_hcd ata_generic virtio_blk virtio_net ehci_hcd ata_piix thermal usbcore nls_base virtio_pci floppy thermal_sys libata button [last unloaded: scsi_wait_scan] [261850.404058] [261850.404058] Pid: 1236, comm: runsv Not tainted (2.6.32-ucs52-686-bigmem #1) Bochs [261850.404058] EIP: 0060:[<c1293e1d>] EFLAGS: 00000296 CPU: 2 [261850.404058] EIP is at _spin_unlock_irqrestore+0x9/0xf [261850.404058] EAX: 00000296 EBX: 00000003 ECX: 00000003 EDX: 00000296 [261850.404058] ESI: 00000000 EDI: 00000001 EBP: f602febc ESP: f602fe6c [261850.404058] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [261850.404058] CR0: 8005003b CR2: 0810acf4 CR3: 36a91000 CR4: 000006f0 [261850.404058] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [261850.404058] DR6: ffff0ff0 DR7: 00000400 [261850.404058] Call Trace: [261850.404058] [<c102422e>] ? pgd_alloc+0x1c7/0x21a [261850.404058] [<c1034908>] mm_init+0xa8/0xd4 [261850.404058] [<c1034dd1>] ? dup_mm+0x6a/0x389 [261850.404058] [<c10356bf>] ? copy_process+0x57a/0xf28 [261850.404058] [<c1035a60>] ? copy_process+0x91b/0xf28 [261850.404058] [<c10361a7>] ? do_fork+0x13a/0x2bc [261850.404058] [<c10bae59>] ? do_sync_read+0x0/0x107 [261850.404058] [<c10bb8ae>] ? vfs_read+0x7b/0xd3 [261850.404058] [<c100dc12>] ? sys_fork+0x15/0x19 [261850.404058] [<c100829c>] ? syscall_call+0x7/0xb [261850.438247] BUG: soft lockup - CPU#0 stuck for 218999s! [pbuilder-update:2713] [261850.438247] Modules linked in: binfmt_misc nfsd exportfs xt_tcpudp nf_conntrack_ipv6 ip6table_mangle ip6table_filter ip6_tables xt_state iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables nfs lockd fscache nfs_acl auth_rpcgss sunrpc quota_v2 quota_tree sd_mod crc_t10dif ide_generic ide_gd_mod ide_core cirrusfb 8139cp mii snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 psmouse i2c_core i6300esb virtio_balloon joydev serio_raw pcspkr processor evdev usbhid hid ext3 jbd dm_snapshot dm_mirror dm_region_hash dm_log dm_mod uhci_hcd ata_generic virtio_blk virtio_net ehci_hcd ata_piix thermal usbcore nls_base virtio_pci floppy thermal_sys libata button [last unloaded: scsi_wait_scan] [261850.438247] [261850.438247] Pid: 2713, comm: pbuilder-update Not tainted (2.6.32-ucs52-686-bigmem #1) Bochs [261850.438247] EIP: 0060:[<c1293e1d>] EFLAGS: 00200292 CPU: 0 [261850.438247] EIP is at _spin_unlock_irqrestore+0x9/0xf [261850.438247] EAX: 00200292 EBX: 00000003 ECX: 00000000 EDX: 00200292 [261850.438247] ESI: 00000000 EDI: 00000001 EBP: f54ddf4c ESP: f54ddefc [261850.438247] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [261850.438247] CR0: 8005003b CR2: b73611a0 CR3: 35683000 CR4: 000006f0 [261850.438247] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [261850.438247] DR6: ffff0ff0 DR7: 00000400 [261850.438247] Call Trace: [261850.438247] [<c102422e>] ? pgd_alloc+0x1c7/0x21a [261850.438247] [<c1292e1f>] ? _cond_resched+0x25/0x3c [261850.438247] [<c1292e56>] ? wait_for_common+0x20/0x100 [261850.438247] [<c1034908>] mm_init+0xa8/0xd4 [261850.438247] [<c10bfb14>] ? bprm_mm_init+0x14/0x163 [261850.438247] [<c10bfffc>] ? do_execve+0xe3/0x25b [261850.438247] [<c1006e29>] ? sys_execve+0x23/0x47 [261850.438247] [<c100829c>] ? syscall_call+0x7/0xb [261850.483139] BUG: soft lockup - CPU#3 stuck for 218991s! [nrpe:1254] [261850.483139] Modules linked in: binfmt_misc nfsd exportfs xt_tcpudp nf_conntrack_ipv6 ip6table_mangle ip6table_filter ip6_tables xt_state iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables nfs lockd fscache nfs_acl auth_rpcgss sunrpc quota_v2 quota_tree sd_mod crc_t10dif ide_generic ide_gd_mod ide_core cirrusfb 8139cp mii snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 psmouse i2c_core i6300esb virtio_balloon joydev serio_raw pcspkr processor evdev usbhid hid ext3 jbd dm_snapshot dm_mirror dm_region_hash dm_log dm_mod uhci_hcd ata_generic virtio_blk virtio_net ehci_hcd ata_piix thermal usbcore nls_base virtio_pci floppy thermal_sys libata button [last unloaded: scsi_wait_scan] [261850.483139] [261850.483139] Pid: 1254, comm: nrpe Not tainted (2.6.32-ucs52-686-bigmem #1) Bochs [261850.483139] EIP: 0060:[<c1293e1d>] EFLAGS: 00200292 CPU: 3 [261850.483139] EIP is at _spin_unlock_irqrestore+0x9/0xf [261850.483139] EAX: 00200292 EBX: 00000003 ECX: 00000004 EDX: 00200292 [261850.483139] ESI: 00000000 EDI: 00000001 EBP: f647feb4 ESP: f647fe64 [261850.483139] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [261850.483139] CR0: 8005003b CR2: b74ee580 CR3: 36010000 CR4: 000006f0 [261850.483139] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [261850.483139] DR6: ffff0ff0 DR7: 00000400 [261850.483139] Call Trace: [261850.483139] [<c102422e>] ? pgd_alloc+0x1c7/0x21a [261850.483139] [<c1034908>] mm_init+0xa8/0xd4 [261850.483139] [<c1034dd1>] ? dup_mm+0x6a/0x389 [261850.483139] [<c10356bf>] ? copy_process+0x57a/0xf28 [261850.483139] [<c1035a60>] ? copy_process+0x91b/0xf28 [261850.483139] [<c10361a7>] ? do_fork+0x13a/0x2bc [261850.483139] [<c114cae3>] ? copy_to_user+0x29/0xf8 [261850.483139] [<c1006e6e>] ? sys_clone+0x21/0x27 [261850.483139] [<c100829c>] ? syscall_call+0x7/0xb [261852.541339] nfs: RPC call returned error 88
Attachment:
signature.asc
Description: This is a digitally signed message part.