Re: Help/Advice with Ethernet NAT or "hub-mode" bridge

"Gabriel L. Somlo" <gsomlo@xxxxxxxxx> · Sat, 1 Apr 2023 12:17:49 -0400

On Fri, Mar 31, 2023 at 06:11:36PM -0700, Payam Chychi wrote:
> Hey!
> 
> To your other idea:
> 
> Are you talking about just disabling mac address leaning for everything
> directly connected on the bridge, and then forcing all traffic to broadcast? 
> 
> Flood mood behavior is set on by default i think for entries not in the fdb.
> 
> Bridge link set dev portX learning on/off
> Also look at learning_sync_off 
> 
> You can also disable all non-physical mac learning by:
> Brctl setaging brX 0

but:

	$ bridge fdb show br br0
	...
	<VM.MAC> dev ens32 vlan 1 master br0 permanent
	<VM.MAC> dev ens32 master br0 permanent

No amount of turning off learning will get rid of that, and the bridge
stubbornly refuses to forward frames with destination mac <VM.MAC>
because it "knows" that address belongs to one of its ports!

I'm looking for a way to force a bridge to forward everything on all
ports except the ingress port, blindly, without going "but-but-but, I
know better, so maybe not this one" :)

Basically like a hub from back when dinosaurs were used to pull plows
on farms :)

> You can always turn off stp and make things as “dumb” as possible if you’re not
> worried about loops.

STP is off, and not part of the picture at all

> Assuming you’ve disabled the iptables hook and only relying on the bridge? If
> not, you still need the basic iptables rule to allow any traffic from /to brX
> on the forwarding plane.

ebtables (not iptables) would be an orthogonal solution. When I try
turning a bridge into a dumb hub, ebtables rules are disabled
completely.

ebtables was my "plan b" once I saw I can't figure out how to make a
bridge "stupid" enough not to try to "help" me. I figured if I can't
beat'em, then try and join them by using MAC address NAT. But then I
ran into the ARP-payload "application-level" translation problem.

If neither ebtables/nftables/whatever can be used to punt a frame to
userspace for "customization", nor a bridge can be dumbed down
sufficiently to where it stops using its "permanent" fdb entries to
decide to refuse forwarding frames, then I'm hosed, or maybe I have to
make custom changes to the bridging decisions in the kernel :)

I was hoping that won't be necessary, and that maybe there's something
I could still toggle.

Thanks,
--Gabriel

> You can always enable proxyarp on the bridge to help reachability.
> 
> This may not be helpful, looking at your diagram on my phone had a hard time
> lining things up 😂
> 
> 
> 
> 
> 
> On Fri, Mar 31, 2023 at 4:27 PM Gabriel L. Somlo <gsomlo@xxxxxxxxx> wrote:
> 
>     Thanks for the reply!
> 
>     On Fri, Mar 31, 2023 at 04:02:13PM -0700, Payam Chychi wrote:
>     > Hey Gabriel,
>     >
>     > I’m not sure if the best way of achieving what you’re intending is by
>     bridging
>     > at the vm down to the container.
>     >
>     > It’s probably not working as you think due to layer2 loop prevention
>     mechanism,
>     > which VM also has implemented within its architecture (many years now)
>     >
>     > The mac-add overwrite function is also by default to maintain a stable
>     > network…lookup proxy arp, gratuitous arp, and arp poisoning as some
>     common
>     > terms.
>     >
>     > Sure, you can fake Arp entries but wow… this is not going to be a stable
>     or
>     > reliable network, take it from someone that designed massive data centers
>     and
>     > did architecture and design for tier1/2 network providers.
> 
>     In the rather specific topology I've shown (with a single container
>     interface "hidden" behind (bridged to) an outside-facing host-VM
>     network interface) it's basically a 1:1 translation, so I don't see
>     why it would be unstable or unreliable :)
> 
>     The question is, *is* there a way to NFQUEUE ebtables traffic to
>     userspace? If not, any insight into why that's only supported at layer-3?
> 
>     This is just a router VM, but instead of running Quagga/FRR on the VM
>     itself and being a single-hop L3 router across the VM-adjacent LANs,
>     I'm running many Quagga/FRR instances inside containers, so this would be
>     a single-vm router simulating many L3 hops. The point is still to present
>     a straightforward default gateway to the "outside" connected LANs, not to
>     design a massive datacenter architecture that presumes the "architect" gets
>     to dictate all the hoops through which all the (presumably "cattle") client
>     VMs (presumably designed by the same "architect") must also jump through...
> 
>     > There are many reasons why an l2vpn was probably recommended to you, it’s
>     meant
>     > for things like your example.
> 
>     The *realism* of "I'm a normie computer, and there's a normie default
>     gateway on the LAN I'm connected to" for the client VMs is the *entire*
>     *point* of the exercise, that's why I'm stubbornly ignoring the "why
>     don't you just set up an l2 vpn thing for everyone" type advice... :)
> 
>     > There are also other protocols and architectures (l3 vpn with additional
>     > encapsulation) you can use… but you should focus on your requirements and
>     > understand if/why an L2 wont work for you.
> 
>     How about my other idea, of turning off enough of the (unwanted, by
>     me, in this particular case) "smarts" of a Linux bridge, so that it
>     blindly and stupidly forwards everything, ignoring "fdb entries" ?
>     This is a 2-port bridge, and all I want from it is that when a frame
>     enters over one port, it should be sent back out the other port(s).
>     Don't look at the FDB, don't decide to drop frames because the
>     destination mac address is permanently associated with the receiving
>     port, don't learn MAC-port associations from the frames, etc... Is
>     there still a way to make that work (there used, to, years back, IIRC)?
>     If not (anymore), then why not ? :)
> 
>     Thanks again,
>     --Gabriel
> 
>     > On Fri, Mar 31, 2023 at 3:14 PM Gabriel L. Somlo <gsomlo@xxxxxxxxx>
>     wrote:
>     >
>     >     Hi,
>     >
>     >     I have several VMs networked together on a cloud-based hypervisor
>     >     solution, where the "vswitch" connecting the VMs enforces a strict
>     >     "one MAC per VM network interface" policy.
>     >
>     >     Typically, one of the VMs has no problem being the "default gateway"
>     >     on such a "vswitch", serving all other VMs connected to the same
>     >     virtualized "LAN" switch.
>     >
>     >     In my case, the default gateway is inside a container running inside
>     >     a network simulator on one of the VMs (many containers in that
>     simulation
>     >     are used to connect groups of VMs on this "router's" several
>     interfaces
>     >     across a simulated multi-hop "internet".
>     >
>     >     The trouble is, if I use the simulator VM's interfaces as bridge
>     ports
>     >     into the simulation, the container-as-default gateway will have its
>     >     traffic dropped by the vswitch outside its host VM. Here's an ASCII
>     >     picture of the setup:
>     >
>     >     -----------------------------
>     >     VM running simulation       |
>     >                                 |
>     >     sim. node,                  |
>     >     (container),                |
>     >     dflt gateway                |
>     >     -----------    - br0 -      |             -----------------
>     >               |   /       \     |  inter-VM   | External VM   |
>     >          eth0 + veth0    ens32  +-- vswitch --+ using in-sim  |
>     >       Sim.MAC |          VM.MAC |             | dflt. gateway |
>     >     -----------                 |             -----------------
>     >     -----------------------------
>     >
>     >     IOW, the "inter-VM vswitch" only allows <VM.MAC> ethernet frames
>     >     from/to the VM running the simulation.
>     >
>     >     I've been trying two different approaches:
>     >
>     >     1. assign VM.MAC to eth0 inside the container, overwriting Sim.MAC
>     >        (e.g., using `ip link set dev eth0 address <VM.MAC>` inside the
>     >        container).
>     >
>     >        I find that when I do that, `br0` will drop external incoming
>     >        frames to <VM.MAC> rather than forward them through `veth0`, and
>     >        that I can't find a way to force br0 to forward everything without
>     >        considering its permanent fdb entries.
>     >
>     >        If I could force br0 to act more like a hub (forward everything
>     >        ignoring the fdb, learn nothing, ever), I could get frames to
>     >        successfully travel between my container's eth0 and the external
>     >        VMs trying to use it as the default gateway. The frames would
>     >        have ens32's VM.MAC, which would satisfy the restrictive
>     hypervisor
>     >        and vswitch policies.
>     >
>     >     2. use ebtables to NAT between ens32's VM.MAC and the container's
>     >        eth0's Sim.MAC:
>     >
>     >          ebtables -t nat -A PREROUTING \
>     >                -i ens32 -d <VM.MAC> -j dnat --to-destination <Sim.MAC>
>     >
>     >          ebtables -t nat -A POSTROUTING \
>     >                -o ens32 -s <Sim.MAC> -j snat --to-source <VM.MAC>
>     >
>     >        This will get frames to successfully cross the bridge with the
>     right
>     >        MAC addresses in the Ethernet headers, but breaks ARP:
>     >
>     >          - the container replies to arp requests from external VMs, its
>     >            *payload* (inner) MAC address is still Sim.MAC, even though
>     >            the Ethernet frame (outer) source MAC address has been
>     rewritten
>     >            to be VM.MAC.
>     >            The ebtables man page seems to indicate that using the
>     arpreply
>     >            extension might take care of this, but so far I've failed to
>     >            have external arp requests get dropped by adding such a rule,
>     >            and they still somehow obtain the Sim.MAC as their default
>     gateway
>     >            host's associated MAC, and things don't work
>     >
>     >         - when the container itself sends out arp requests for external
>     VM's
>     >           mac addresses, it places its own Sim.MAC in the inner source
>     MAC
>     >           field
>     >
>     >        Would this be a situation in which I can (should) be able to use
>     >        the NFQUEUE target to be able to "edit" packets myself in
>     userspace?
>     >
>     >        There seems to be no NFQUEUE support in ebtables, unlike iptables.
>     >        Is that right, or am I missing something?
>     >
>     >        Is there any other way to dynamically "fix up" ARP to match the
>     changes
>     >        made to the "outer" (Ethernet header) MAC addresses?
>     >
>     >     I've been advised to use a layer-2 VPN solution, but that would break
>     >     "realism" for the external client VMs, and, besides, I'm trying to
>     avoid
>     >     imposing restrictions and requirements on them, since they're
>     independently
>     >     developed and operated, and a "transparent" solution where the
>     default
>     >     gateway is on the magic "router" VM, period, would be a huge
>     usability
>     >     win.
>     >
>     >     Any ideas on what I'm missing, doing wrong, or should otherwise be
>     looking
>     >     into would be much appreciated!
>     >
>     >     Thanks,
>     >     --Gabriel
>     >
>     > --
>     > Payam Tarverdyan Chychi
> 
> --
> Payam Tarverdyan Chychi