Hoping someone else has seen this before. I have a few dozen Dell R610 systems with CentOS 5.2 that are using kernels from 5.3 and 5.4 (2.6.18-128.1.10.el5 & 2.6.18-164.6.1.el5), that at random lose layer 2 network connectivity either partially or totally. Running tcpdump on the interface reveals only ARP broadcasts, no responses. Switch reports no packets being received on the interface. Systems can run for days/weeks or even months without an issue then drop off the network. At first I thought it was the Dell switches which we had lots of problems with but it has happened on two other brands of switches as well(Cisco and Extreme), so I no longer believe it's the switch but rather the systems. The workaround is to restart the network on the system. I have even configured the bonding driver to do ARP requests and fail over to the backup link in the event that fails but wasn't successful there either as both links can go down, and/or the system can go into "degraded" state where it can reach some systems but not others. I have ESXi systems running on the same hardware and to-date have not seen any of them drop off the same way. System can be under high traffic load at the time or completely idle, it doesn't seem to make a difference. No log entries indicating what might be going on. I have a case open with Dell but am not expecting a whole lot from them, maybe I'll get lucky though. They asked me to upgrade the NIC firmware which I did on a batch of systems to no avail(the release notes for the firmware said nothing about any fixes that sounded like my issue). Driver versions: ESXi (vSphere): Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v1.6.9 (December 8, 2007) Most linux systems(5.3 kernel): Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v1.7.9-1 (July 18, 2008) Some linux systems(5.4 kernel): Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v1.9.3 (March 17, 2009) Happens across at least a dozen systems spread over 4 data centers. Never seen this sort of behavior before in the hundreds and hundreds of systems I've run. These systems are all new, the R610 hardware was released around May 2009, and we've been having issues since day 1, but only recently have been able to rule the switches out as the cause. The latest driver on Broadcom's site is 1.9.20b which seems odd since CentOS 5.4 seems to come with 1.9.3(the date on the Broadcom site is more recent than the date on the linux kernel driver in 5.4) Most of the fixes in the recent driver versions seem to focus around iSCSI, which I'm not using. lspci says: 02:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) Subsystem: Dell Unknown device 0236 Flags: bus master, fast devsel, latency 0, IRQ 114 Memory at dc000000 (64-bit, non-prefetchable) [size=32M] Capabilities: [48] Power Management version 3 Capabilities: [50] Vital Product Data Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/4 Enable- Capabilities: [a0] MSI-X: Enable- Mask- TabSize=9 Capabilities: [ac] Express Endpoint IRQ 0 Capabilities: [100] Device Serial Number c9-dc-93-fe-ff-9b-21-00 Capabilities: [110] Advanced Error Reporting Capabilities: [150] Power Budgeting Capabilities: [160] Virtual Channel I suppose I could go build the latest driver from their site and see how it goes.. thanks nate _______________________________________________ CentOS mailing list CentOS@xxxxxxxxxx http://lists.centos.org/mailman/listinfo/centos