Re: Linux guest domain with two vnets bound to the same vswitch experiences hung in bootup (sun_netraT5220)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[ Please retain CC: in all replies, thanks. ]

Hey, I want to investigate this further because something about
these traces still perplexes me.

Could you get me some information?

1) Setup the failing case (but with one of the fixes in the kernel
   so you can run commands), and grab the contens of /proc/interrupts
   and post that output here.

2) What firmware and hypervisor are you running on this machine?
   (you can get this via 'showhost' at the "sc>" prompt)

   I'm running Sun System Firmware 7.1.7.h on my machine.

The reason I ask #2 is that there is a hypervisor bug with LDC
connections wherein the interrupt can be sent twice erroneously
and this can cause loops in the per-cpu interrupt INO list.

There is a partial workaround already in the tree:

commit 5a606b72a4309a656cd1a19ad137dc5557c4b8ea
Author: David S. Miller <davem@xxxxxxxxxxxxxxxxxxxx>
Date:   Mon Jul 9 22:40:36 2007 -0700

    [SPARC64]: Do not ACK an INO if it is disabled or inprogress.
    
    This is also a partial workaround for a bug in the LDOM firmware which
    double-transmits RX inos during high load.  Without this, such an
    event causes the kernel to loop forever in the interrupt call chain
    ACK'ing but never actually running the IRQ handler (and thus clearing
    the interrupt condition in the device).
    
    There is still a bad potential effect when double INOs occur,
    not covered by this changeset.  Namely, if the INO is already on
    the per-cpu INO vector list, we still blindly re-insert it and
    thus we can end up losing interrupts already linked in after
    it.
    
    We could deal with that by traversing the list before insertion,
    but that's too expensive for this edge case.
    
    Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>

But, as stated, it cannot deal with all possibilities that result
from this firmware bug.  Best is to have the most uptodate firmware
with the fix.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Kernel Development]     [DCCP]     [Linux ARM Development]     [Linux]     [Photo]     [Yosemite Help]     [Linux ARM Kernel]     [Linux SCSI]     [Linux x86_64]     [Linux Hams]

  Powered by Linux