On Wed, Jan 20, 2010 at 3:45 PM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote: > Another quick fix is to remove all physical NICs except for one from the > Virtual switch(s) that are being bridged. You lose redundancy, but that will > get them up fast until a fix is found. Wouldn't spanning tree do the same thing, but with automated recovery in the case of a failure of the live link? > > Robert LeBlanc > Life Sciences & Undergraduate Education Computer Support > Brigham Young University > > > On Wed, Jan 20, 2010 at 2:29 PM, Brad Hudson <hudson@xxxxxxxxxxx> wrote: >> >> Robert; >> >> I looked over your site and could not find the document you reference in >> your links. Can you provide the url to get to it? >> >> I'll be happy to pass along anything I find and would appreciate it if >> you would do the same. As the client having the issue is using it for >> production we may need to move to a non-bridged variety of transparent >> firewall with proxy_arp to get them back up quickly. Ideally I would >> like to avoid that, but it's prod and needs to work. >> >> Regards; >> >> Brad >> >> Robert LeBlanc wrote: >> > On Wed, Jan 20, 2010 at 1:04 PM, Brad Hudson <hudson@xxxxxxxxxxx >> > <mailto:hudson@xxxxxxxxxxx>> wrote: >> > >> > Hi all; >> > >> > I have an odd problem that I have been dealing with for a week. I >> > was >> > hoping someone could help, or point me in the right direction for >> > clues. >> > >> > I have a standard bridge setup. br0 is composed of eth0 and eth1. >> > >> > # brctl show bro >> > bridge name bridge id STP enabled interfaces >> > br0 8000.000c292280b9 no eth0 >> > eth1 >> > >> > Eth0 and eth1 both have 0.0.0.0 (no) address assigned and are up. >> > br0 >> > is assigned the proper IP and the routing table is correct. STP is >> > off. >> > >> > I have been losing connectivity to hosts inside the local segment of >> > the >> > bridge. Some investigation has revealed that the problem is related >> > to >> > arp not working correctly. Arp packets going this way >> > >> > eth1->br0->eth0->network/internet >> > >> > have no problems at all. The replies coming back the other way all >> > get >> > to br0, but only 33% (approx, it varies) make it to the eth1 side of >> > the >> > bridge. I have verified this traffic pattern by tcpdump of arp >> > packets >> > through each of these devices while doing an nmap -sP of the /24 >> > network >> > to generate both arp and icmp. We are not able to arp any host >> > outside >> > our local segment, including the default gateway (which is owned by >> > the >> > co-lo). nmapping from the bridging server itself from interface br0 >> > gets the correct number of arp replies. >> > >> > ebtables and arp_tables are not running, and adding them in has had >> > no >> > change in result. There was a server with 2 NICs, each with an IP >> > on >> > the same subnet, that was causing some MAC flapping but that has >> > been >> > fixed and no change to the described behaviour. All items in >> > /proc/sys/net/bridge are set to '1', but setting them to '0' has no >> > effect. The server hosting the bridge has been rebooted several >> > times >> > with no effect. proxy_arp does not help at all. I also tried >> > parprouted with no success. >> > >> > A couple other notes. >> > >> > - This behaviour suddenly appeared about a week ago. I think this >> > is >> > probably related to an increase in network traffic but it's hard to >> > say, >> > the client does not buy into that statement. If it was a matter of >> > 0 >> > work or all work then there's places to look for that, but in this >> > case >> > the problem is intermittent and the lost arp replies are not the >> > same >> > every time. >> > - In another test we found that if we ping the inside server from >> > the >> > firewall and also from an external machine the connectivity to the >> > inside server dies. Once the pings are stopped, the connectivity >> > eventually returns. If I ping out from the inside server while >> > doing >> > that test, the session keeps going through without hanging. >> > - The firewall is a Vm running under ESX. The vmxnet driver has >> > been >> > reinstalled and the pcnet32 driver is not loaded. Both NICs are >> > virtual >> > so there is no chance of failed hardware, though I suppose the >> > problem >> > could be on the ESX layer. I have made some attempt to diagnose the >> > WSX >> > layer but nothing jumps out at me. >> > >> > I have been watching tcpdumps and do not see any sign of frags, >> > dupes, >> > or anything that would cause lost packets. I have combed the >> > newsgroups, google and even irc looking for clues or similar >> > situations, >> > but nothing I have found fits the profile. >> > >> > The workaround we currently have in place is to make a static arp >> > entry >> > for the gateway on all servers on the inside. This is not ideal >> > because >> > the co-lo controls the router and it could fail over to another >> > device >> > which would kill our route again. >> > >> > Can anyone suggest anyplace I can look for clues, settings I should >> > check or other? I am out of ideas at this point. >> > >> > Your help is very much appreciated. >> > >> > Regards; >> > >> > Brad >> > >> > >> > >> > -- >> > Brad Hudson >> > SA Team Lead >> > The Pythian Group - love your data >> > Desk: 613-565-8696 x202 >> > IM: pythianhudson >> > >> > >> > I assume you have multiple physical NICs connected to your virtual >> > switch. If so I've posted my finding on my web page >> > http://robert.leblancnet.us and I've posted a message to this form two >> > days ago entitled "Need help writing ebtables rules". I'm not sure my >> > messages are getting through as I've sent a few messages with no one >> > responding. If we can work together to solve the problem, we can both >> > benefit. >> > >> > Thanks, >> > >> > Robert LeBlanc >> > Life Sciences & Undergraduate Education Computer Support >> > Brigham Young University >> > >> >> >> >> -- >> Brad Hudson >> SA Team Lead >> The Pythian Group - love your data >> Desk: 613-565-8696 x202 >> IM: pythianhudson > > > _______________________________________________ > Bridge mailing list > Bridge@xxxxxxxxxxxxxxxxxxxxxxxxxx > https://lists.linux-foundation.org/mailman/listinfo/bridge > _______________________________________________ Bridge mailing list Bridge@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/bridge