Lockups with tg3 driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



(I hope this is the right list to report this kind of issue?)

I'm hitting some problems with a dual Broadcom gigabit network adapter, and am trying to work out whether its a problem with the network driver or the adapter itself. The dual ports are built onto a Tyan Thunder K8SR S2881 motherboard, with two Opteron 248 processors and running Fedora Linux. The adapter claims to be a Broadcom Corporation NetXtreme BCM5704, PCIX-100Mhz.

The problem is: If I heavily load both network ports, either with lots of short TCP connections averaging around 100Mbits/s traffic in+out of both ports, or a smaller number of long-lived TCP connections with ~1000Mbits in+out, eth0 and/or eth1 will simply disappear off the network after around ten seconds. The syslog shows:

tg3: eth1: transmit timed out, resetting
tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2
tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2
tg3: eth0: transmit timed out, resetting
tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2
tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2

If I ifconfig the interfaces down and then up, it will re-appear on the network again.

I have tried a variety of kernels to see if there is any change in behaviour, kernels 2.6.5 - 2.6.8-rc2, with SMP enabled + disabled. The result is generally the same.

Sometimes the machine has locked up completely when the network dies, with no message on the console. ONCE, I got a kernel BUG message with 2.6.8-rc2 but uselessly I lost the details :( (The only bit I remember is BUG at tg3.c:2268)

I've also tried Broadcom's own(?) drivers, version 7.1.22, and the results are basically identical - loss of network after a short time, often with a complete kernel lockup.

Finally, I also tried putting in another PCI-X gigabit card, a Broadcom Corporation NetXtreme BCM5701. This card seems to work perfectly in the machine using exactly the same drivers (although since it only has the one port, there is half the network throughput during testing).

So it could be that the on-board networking is just screwed, or maybe the tigon3 drivers aren't happy with its particular revision of network adapter. Is there anything I can do to discover which it is? Obviously I'd be dead happy if the network problems could be tracked down to a kernel bug, but if there's a way to show that the network adapter is at fault then it would be cool too.

Thanks,
Ben
(please cc me on any replies!)


Network card details: (lspci -v)

02:09.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 03)
Subsystem: Broadcom Corporation: Unknown device 1644
Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 24
Memory at fc9c0000 (64-bit, non-prefetchable) [size=fc9a0000]
Memory at fc9b0000 (64-bit, non-prefetchable) [size=64K]
Expansion ROM at 00010000 [disabled]
Capabilities: [40] PCI-X non-bridge device.
Capabilities: [48] Power Management version 2
Capabilities: [50] Vital Product Data
Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable-


02:09.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 03)
Subsystem: Broadcom Corporation: Unknown device 1644
Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 25
Memory at fc9f0000 (64-bit, non-prefetchable) [size=fc9d0000]
Memory at fc9e0000 (64-bit, non-prefetchable) [size=64K]
Expansion ROM at 00010000 [disabled]
Capabilities: [40] PCI-X non-bridge device.
Capabilities: [48] Power Management version 2
Capabilities: [50] Vital Product Data
Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable-


Broadcom's driver dmesg:
Broadcom Gigabit Ethernet Driver bcm5700 with Broadcom NIC Extension (NICE) ver. 7.1.22 (01/07/04)
ACPI: PCI interrupt 0000:02:09.0[A] -> GSI 24 (level, low) -> IRQ 24
eth0: Broadcom BCM5704 1000Base-T found at mem fc9c0000, IRQ 24, node addr 00e081295040
eth0: Broadcom BCM5704 Integrated Copper transceiver found
eth0: Scatter-gather ON, 64-bit DMA ON, Tx Checksum ON, Rx Checksum ON, 802.1Q VLAN ON, TSO ON
ACPI: PCI interrupt 0000:02:09.1[B] -> GSI 25 (level, low) -> IRQ 25
eth1: Broadcom BCM5704 1000Base-T found at mem fc9f0000, IRQ 25, node addr 00e081295041
eth1: Broadcom BCM5704 Integrated Copper transceiver found
eth1: Scatter-gather ON, 64-bit DMA ON, Tx Checksum ON, Rx Checksum ON, 802.1Q VLAN ON, TSO ON

tg3 driver dmesg:
tg3.c:v3.8 (July 14, 2004)
ACPI: PCI interrupt 0000:02:09.0[A] -> GSI 24 (level, low) -> IRQ 24
eth0: Tigon3 [partno(BCM95704A7) rev 2003 PHY(5704)] (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:29:50:40
eth0: HostTXDS[1] RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[1]
ACPI: PCI interrupt 0000:02:09.1[B] -> GSI 25 (level, low) -> IRQ 25
eth1: Tigon3 [partno(BCM95704A7) rev 2003 PHY(5704)] (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:29:50:41
eth1: HostTXDS[1] RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[1]

-
: send the line "unsubscribe linux-net" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Netdev]     [Ethernet Bridging]     [Linux 802.1Q VLAN]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Git]     [Bugtraq]     [Yosemite News and Information]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux PCI]     [Linux Admin]     [Samba]

  Powered by Linux