Hello, good news: On Friday 30 March 2012 19:44:50 you wrote: > On Monday 09 January 2012 12:41:41 Philipp Hahn wrote: > > one of our VMs regularly get stuck: the VM is completely unresponsive (no > > ssh, no serial console, no VNC). Using "gdbserver" and a remote system to > > debug the running VM, I see 3 CPUs (1,3,4) stuck in > > pgd_alloc() → spin_lock_irqsave(pgd_lock) > > while the 4th CPU (2) is waiting in > > pgd_alloc() → pgd_prepopulate_pmb() →... → flush_tlb_others_ipi() > > > > 195 while > > (!cpumask_empty(to_cpumask(f->flush_cpumask))) 196 > > cpu_relax(); > > (gdb) print f->flush_cpumask > > $5 = {1} > > > > CPU 1 is duing a do_exec() syscall, will CPU 2-4 are doing a do_fork() > > syscall according to "thread apply all backtrace". It'a guest kernel bug already fixed in v2.6.38 [1], but not (yet) back-ported to 2.6.32-longterm. [2] fixed a bug with TLB flushing when using PAE, which made the hidden bug trigger a lot more often. It only happens when using a PAE enabled guest kernel with >=2 CPUs. Full details are in our German Bugzilla [3]. [1] <http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;h=a79e53d85683c6dd9f99c90511028adc2043031f> [2] <http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;h=4981d01eada5354d81c8929d5b2836829ba3df7b> [3] <https://forge.univention.org/bugzilla/show_bug.cgi?id=26661> Sincerely Philipp -- Philipp Hahn Open Source Software Engineer hahn@xxxxxxxxxxxxx Univention GmbH be open. fon: +49 421 22 232- 0 Mary-Somerville-Str.1 D-28359 Bremen fax: +49 421 22 232-99 http://www.univention.de/
Attachment:
signature.asc
Description: This is a digitally signed message part.