e1000 and bonding overruns with jumbo

Philip Molter <philip@xxxxxxxxxxxxxxx> · Sun, 26 Jun 2005 10:57:26 -0500

I have this scenario with a dual-Intel gigabit machine setup with 
channel bonding:

eth0 : MTU 9000, slave to bond0
eth1 : MTU 9000, slave to bond0
bond0: MTU 9000, bonding mode=5 (also happens with bonding mode=1)

This box takes small broadcast packets in on both interfaces, small 
request packets in on the primary incoming interface, and sends out huge 
packets out both interfaces.  After letting this setup run for about 20 
minutes, I can issue an 'ethtool -G eth0 rx 1024'.  The interface will 
reset, as expected.  The bonding will fail over to eth1, as expected. 
When eth0 is redetected, eth1 will start logging huge number of overruns 
(4-5k pkts/sec) and networking will effectively stop until eth1 is 
disabled.  A reboot is required to fix this problem.  Unloading and 
reloading the network driver or toggling its link status will not fix it.

This is the semi-reproducible case.  It's actually a bigger problem for 
me because this situation can happen spontaneously for seemingly no 
reason, although it's often (not always) tied to a bonding failover.

Does anyone have any ideas?  It doesn't happen with the tg3 driver, and 
it doesn't happen if the interfaces aren't handling jumbo packets.  It 
also needs to run for a while before the ethtool trick will trigger it, 
although it's entirely reproducible otherwise.

Thank you for any information,
Philip

-
: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html