I have this scenario with a dual-Intel gigabit machine setup with
channel bonding:
eth0 : MTU 9000, slave to bond0
eth1 : MTU 9000, slave to bond0
bond0: MTU 9000, bonding mode=5 (also happens with bonding mode=1)
This box takes small broadcast packets in on both interfaces, small
request packets in on the primary incoming interface, and sends out huge
packets out both interfaces. After letting this setup run for about 20
minutes, I can issue an 'ethtool -G eth0 rx 1024'. The interface will
reset, as expected. The bonding will fail over to eth1, as expected.
When eth0 is redetected, eth1 will start logging huge number of overruns
(4-5k pkts/sec) and networking will effectively stop until eth1 is
disabled. A reboot is required to fix this problem. Unloading and
reloading the network driver or toggling its link status will not fix it.
This is the semi-reproducible case. It's actually a bigger problem for
me because this situation can happen spontaneously for seemingly no
reason, although it's often (not always) tied to a bonding failover.
Does anyone have any ideas? It doesn't happen with the tg3 driver, and
it doesn't happen if the interfaces aren't handling jumbo packets. It
also needs to run for a while before the ethtool trick will trigger it,
although it's entirely reproducible otherwise.
Thank you for any information,
Philip
-
: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html