RE: acpi nic card flapping

Thomas Renninger <trenn@xxxxxxx> · Tue, 28 Nov 2006 13:00:50 +0100

On Fri, 2006-11-24 at 23:34 -0800, Yong Lee wrote:
> Hi all,
> 
> I’m hoping that someone out there can lend me a hand with a problem that we
> were seeing.  I’m not very familiar with the acpi tool so please bear with
> me.
> 
> We had an outage where we could not ssh into our web server and we had to do
> a reboot from our console to get things running again.  It looks like an
> acpi problem and I’m trying to figure out what was going on.  Was ACPI going
> crazy or was it trying to report a problem condition that we were not aware
> of.
> 
> What we saw in the dmesg log was this :
> 
> shpchp: Address64 -------- Resource unparsed
> shpchp: acpi_pciehprm:\_SB_.PCI0.PBLO OSHP fails=0x5
> shpchp: acpi_shpchprm:   Slot sun(0) at s:b:d:f=0x00:04:1f:00
> shpchp: acpi_pciehprm:\_SB_.PCI0.PBLO OSHP fails=0x5
> shpchp: acpi_pciehprm:\_SB_.PCI0.PBLO OSHP fails=0x5
> shpchp: acpi_pciehprm:\_SB_.PCI0.PBLO OSHP fails=0x5
> shpchp: acpi_pciehprm:\_SB_.PCI0.PBLO OSHP fails=0x5
> shpchp: acpi_pciehprm:\_SB_.PCI0.PBLO OSHP fails=0x5
> shpchp: acpi_pciehprm:\_SB_.PCI0.PBLO OSHP fails=0x5
> shpchp: acpi_pciehprm:\_SB_.PCI0.PBLO OSHP fails=0x5
> shpchp: acpi_pciehprm:\_SB_.PCI0.VPR0 OSHP fails=0x5
> shpchp: acpi_pciehprm:\_SB_.PCI0.VPR0 OSHP fails=0x5
> shpchp: acpi_pciehprm:\_SB_.PCI0.VPR0 OSHP fails=0x5
> shpchp: acpi_pciehprm:\_SB_.PCI0.VPR0 OSHP fails=0x5
> shpchp: acpi_pciehprm:\_SB_.PCI0.VPR0 OSHP fails=0x5
> shpchp: acpi_pciehprm:\_SB_.PCI0.VPR0 OSHP fails=0x5
> shpchp: acpi_pciehprm:\_SB_.PCI0.VPR0 OSHP fails=0x5
> shpchp: acpi_pciehprm:\_SB_.PCI0.VPR0 OSHP fails=0x5
> shpchp: shpc_init : shpc_cap_offset == 0
> shpchp: shpc_init : shpc_cap_offset == 0
> shpchp: shpc_init : shpc_cap_offset == 0
> shpchp: shpc_init : shpc_cap_offset == 0
> shpchp: shpc_init : shpc_cap_offset == 0
> shpchp: shpc_init : shpc_cap_offset == 0
> shpchp: shpc_init : shpc_cap_offset == 0
> shpchp: shpc_init : shpc_cap_offset == 0
> shpchp: shpc_init : shpc_cap_offset == 0
> shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
> 
> During the time of the outage we saw from our router logs that the
> connection to the server was going up and down.
> 
> There was a lot of other messages on the console but our sysadmin guy didn’t
> capture this.
Hmm, so that may not be the root cause of your problems?
> 
> We’re running redhat linux 2.6.9-34.0.2.ELsmp on intel xeon processors.  
> We have 2 intel nic cards : Intel Corporation 82541GI/PI Gigabit Ethernet
> Controller (rev 05)
> 
> Any light you can shed on this problem would be great.  Note that while the
> kacpid kernel thread is running the acpid daemon was shut off during this
> incident.
If the pci hotplug module (shpchp, difficult to spell...) really causes
this it might be kernel or a BIOS bug. If this is a production machine
that is already running for a while, I would not risk a BIOS update or
waste time with kernel compilations. Best/simplest would be to remove
the module out of /lib/modules/xy/kernel/drivers/pci/hotplug/shpchp.ko
directory if you do not need PCI hotplug urgently.

Hope that works...

      Thomas

-
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html