Re: ARP flux vs. weak/strong ES model

Erik Auerswald <auerswal@xxxxxxxxxxxxxxxxx> · Mon, 18 Feb 2019 13:10:42 +0100

Hi,

On 2/18/19 03:18, Grant Taylor wrote:
On 2/17/19 5:37 PM, Erik Auerswald wrote:
I have only ever seen it on Linux.

Likewise.

I think that this assertion motivates looking at non-Linux systems, 
especially traditional routers, and if they act as weak or strong 
end-systems. And then look at their ARP handling, excluding Proxy ARP.

Before looking at other systems, I'd want to step back and think how 
weak vs strong end-systems /should/ behave regarding ARP flux.

As I see it, strong ES model should preclude ARP flux, because an ES 
first telling "use this MAC to send to my IP X", but then discarding the 
packet sent based on that information, seems nonsensical.

In the weak ES model, telling another ES to send to MAC Y to deliver to 
IP X, although the interface with MAC Y uses IP Z≠X seems OK, but not 
mandatory. To me this feels much like the use of Proxy ARP to work 
around misconfigured end-systems. Ignoring ARP requests on all 
interfaces but the one with the asked for IP seems OK, too.

Aside:  I think of Proxy ARP as a form of routing.

Freely combining MAC and IP addresses in ARP replies looks quite similar 
to Proxy ARP to me.

After all, the Proxy 
ARP router, is doing exactly what it would do to route a packet if it 
had naturally come to it / it's MAC address.  All Proxy ARP is doing is 
responding to ARP requests for things on the other side of the router 
such that the packet does come to the router.

That is correct, but Proxy ARP will result in a router answering an ARP 
request that was not for the receiving interface, thus it could be 
confused with ARP flux in a test.

[...]
Again, any "traditional" router accepts IP packets directed to any of 
its interface IPs irrespective of the ingress interface. That is the 
basis for using a loopback address for router management or BGP 
sessions. In that case a router acts as an end-system as well.

That's not entirely true.  Especially when filtering / firewalling is in 
place to only allow traffic from specific interfaces.

Filtering / firewalling can be in effect in Linux as well, including for 
Ethernet (ebtables). That would most likely affect ARP flux and weak ES 
model behaviour as well, depending on rule set.

I've also viewed that as the traffic would be routed through the device 
to the proper interface which would then process it accordingly.  In, 
over / through and then up the IP stack instead of in and directly up 
the IP stack.

It is not observably "routed" in that all routing actions (L2 rewrite, 
TTL decrement) usually happen when sending the packet _out_ the egress 
interface.

[...]
Let's back up and discuss what is actually allowing ARP flux to happen.

As I understand it, the /flux/ comes from the fact that the MAC address 
that ARPing hosts get replies from changes and fluctuates.  Hence the name.

It's my understanding that this happens because Linux does not filter 
(in any meaning of the word) incoming ARP requests (or outgoing replies) 
based on the physical interface.  This is especially true when you have 
multiple interfaces in the same broadcast domain.

That is a logical explanation of the name "ARP flux".

Aside:  The last time I tried to put two interfaces in the same subnet 
and connect them to the same broadcast domain on a Cisco, it would not 
allow me to do so.

Correct. Huawei VRP allows a special case (a loopback interface with a 
/32 IP inside an IP subnet with shorter prefix active on another 
interface, including Ethernet interfaces), but I have not yet found time 
to thoroughly test the behaviour of that. Other networking equipment I 
used did not allow two interfaces in the same (or overlapping) subnet(s).

I'd say it is somewhat independent of the weak ES model. It is a 
symptom of the Linux IP stack. That IP stack may be built around weak 
ES model ideas. Other IP stacks adhere to the weak ES model as well 
without exhibiting ARP flux.

Sorry for being pedantic, but I think we need to clearly define the 
configuration and behavior that we're discussing.

I say this because I think that "ARP flux" is a symptom of having a 
Linux box with two interfaces in the same broadcast domain, thus able to 
hear the same ARP request and that the flux comes from the ensuing race 
conditions as to which interface will be processed -and- reply first.

I feel like this same scenario is seldom played out in traditional 
network gear.  And if we want to have the discussion about this, we 
should configure said gear comparably and test how it behaves.

I will also state that Linux may likely respond to ARP requests from an 
inside interface for IPs on the outside interface.  But in such a 
scenario, there is only one interface connected to the broadcast domain, 
thus there is nothing to flux over as it will always be the single 
possible interface.

So, let's define what the connections are, and how things are configured.

I'm stating two interfaces connected to the same broadcast domain, each 
with IPs in different subnets.  (Thus the broadcast domain is overloaded 
and has multiple subnets on it.)  I think there is a reasonable chance 
that the ARP flux symptoms can occur in this configuration.

I'm thinking Linux /kernel/ default (no distro sysctl modifications or 
kernel compilation tweaks).  I'm also thinking Proxy ARP is disabled.

Do you agree?  Or do you want to alter the configuration?

I want to extend that scenario to include two interfaces A and B with 
different IP addresses connected to two separate broadcast domains. 
While that does not result in fluctuating ARP replies (ARP flux), it 
does result in ARP replies combining MAC of interface A with IP of 
interface B.

Both scenarios (two interfaces connected to one broadcast domain, two 
interfaces connected to separate broadcast domains) show symptoms of the 
same underlying cause. The name ARP _flux_ is more fitting to the first 
scenario, the second could be better described as "ARP confusion" (I 
made that name up just now).

[...]
That being said, you do have me questioning things.  At the moment, I'm 
sticking with what I've thought for years.  But I am interested in 
continuing the conversation and learning, what ever the lesson may be.

Likewise.

Best regards,
Erik

Re: ARP flux vs. weak/strong ES model

Linux Advanced Routing and Traffic Control