Re: [SOLVED] 2.6.32 stuck in flush_tlb_others_ipi()

Philipp Hahn <hahn@xxxxxxxxxxxxx> · Thu, 19 Apr 2012 15:42:31 +0200

Hello,

good news:

On Friday 30 March 2012 19:44:50 you wrote:
> On Monday 09 January 2012 12:41:41 Philipp Hahn wrote:
> > one of our VMs regularly get stuck: the VM is completely unresponsive (no
> > ssh, no serial console, no VNC). Using "gdbserver" and a remote system to
> > debug the running VM, I see 3 CPUs (1,3,4) stuck in
> >  pgd_alloc() → spin_lock_irqsave(pgd_lock)
> > while the 4th CPU (2) is waiting in
> >  pgd_alloc() → pgd_prepopulate_pmb() →... →  flush_tlb_others_ipi()
> >
> > 195                     while
> > (!cpumask_empty(to_cpumask(f->flush_cpumask))) 196
> >    cpu_relax();
> > (gdb) print f->flush_cpumask
> > $5 = {1}
> >
> > CPU 1 is duing a do_exec() syscall, will CPU 2-4 are doing a do_fork()
> > syscall according to "thread apply all backtrace".

It'a guest kernel bug already fixed in v2.6.38 [1], but not (yet) back-ported 
to 2.6.32-longterm. [2] fixed a bug with TLB flushing when using PAE, which 
made the hidden bug trigger a lot more often. It only happens when using a 
PAE enabled guest kernel with >=2 CPUs.
Full details are in our German Bugzilla [3].

[1] 
<http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;h=a79e53d85683c6dd9f99c90511028adc2043031f>
[2] 
<http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;h=4981d01eada5354d81c8929d5b2836829ba3df7b>
[3] <https://forge.univention.org/bugzilla/show_bug.cgi?id=26661>

Sincerely
Philipp
-- 
Philipp Hahn           Open Source Software Engineer      hahn@xxxxxxxxxxxxx
Univention GmbH        be open.                       fon: +49 421 22 232- 0
Mary-Somerville-Str.1  D-28359 Bremen                 fax: +49 421 22 232-99
                                                   http://www.univention.de/
Attachment:
signature.asc

Description: This is a digitally signed message part.