Re: Help/Advice with Ethernet NAT or "hub-mode" bridge

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


Here's my current (working!) solution, but I feel I shouldn't have to
jump to *this* many hoops (see below) to make it work, there should be
an easier less painful way to pull it off! :)

On Fri, Mar 31, 2023 at 05:52:44PM -0400, Gabriel L. Somlo wrote:
> I have several VMs networked together on a cloud-based hypervisor
> solution, where the "vswitch" connecting the VMs enforces a strict
> "one MAC per VM network interface" policy.
> Typically, one of the VMs has no problem being the "default gateway"
> on such a "vswitch", serving all other VMs connected to the same
> virtualized "LAN" switch.
> In my case, the default gateway is inside a container running inside
> a network simulator on one of the VMs (many containers in that simulation
> are used to connect groups of VMs on this "router's" several interfaces
> across a simulated multi-hop "internet".
> The trouble is, if I use the simulator VM's interfaces as bridge ports
> into the simulation, the container-as-default gateway will have its
> traffic dropped by the vswitch outside its host VM. Here's an ASCII
> picture of the setup: 
> -----------------------------
> VM running simulation       |
>                             |
> sim. node,                  |
> (container),                |
> dflt gateway                |
> -----------    - br0 -      |             -----------------
>           |   /       \     |  inter-VM   | External VM   |
>      eth0 + veth0    ens32  +-- vswitch --+ using in-sim  |
>   Sim.MAC |          VM.MAC |             | dflt. gateway |
> -----------                 |             -----------------
> -----------------------------
> IOW, the "inter-VM vswitch" only allows <VM.MAC> ethernet frames
> from/to the VM running the simulation.

#1. On the simulator VM, create a veth pair (`vi` facing the container):

	ip link add vi0 type veth peer name vo0

#2. create a bridge between "outward" facing `vo0` and `ens32`:

	ip link add br0 type bridge
	ip link set vo0 master br0 
	ip link set ens32 master br0 

#3. bring up the "outward" facing bridge and its ports:

	ip link set dev br0 up
	ip link set dev vo0 up
	ip link set dev ens32 up

#4. assign `vi0` as the "bridge" interface in the Net.Sim. (e.g., gns3
#   or CORE network simulators):

#5. after Net.Sim. starts, we have a situation like the following:

 |-------------       bXYZ          br0      |               ---------
 || container |       /  \         /   \     |               | other |
 ||      eth0 + vethXYZ  vi0 --- vo0   ens32 + -- vswitch -- + guest |
 ||           |                      Pub.MAC |               | VM(s) |
 |-------------               |              |               ---------
 | < controlled by Net.Sim.>  | <manual conf>| 
 |                                           |
 |                 Simulator VM              |

#6. Set up "double MAC NAT" allowing container `eth0` to use `Pub.MAC`:

	ebtables -t nat -F
	ebtables -t nat -A PREROUTING  -i ens32 -d <Pub.MAC> \
	         -j dnat --to-destination de:ad:be:ef:00:01
	ebtables -t nat -A POSTROUTING -o ens32 -s de:ad:be:ef:00:01 \
	         -j snat --to-source <Pub.MAC>
	ebtables -t nat -A PREROUTING  -i vi0   -d de:ad:be:ef:00:01 \
	         -j dnat --to-destination <Pub.MAC>
	ebtables -t nat -A POSTROUTING -o vi0   -s <Pub.MAC> \
	         -j snat --to-source de:ad:be:ef:00:01

# NOTE: If traffic arrives on a bridge with a destination MAC belonging
#       to one of its own ports (a "permanent" FDB entry), it will not
#       be forwarded. Therefore `de:ad:be:ef:00:01` is subtituted for
#       <Pub.MAC> on the `vi0` <--> `vo0` link, and NAT-ed back to the
#       real <Pub.MAC> after the two bridges have been "tricked" into
#       forwarding the frame!

#7. Set <Pub.MAC> as the mac address of the container's `eth0`:

	ip link set dev eth0 down
	ip link set dev eth0 address <Pub.MAC>
	ip link set dev eth0 up

#8. Restart dhcp inside the container, and we're good to go!

# The Net.Sim. can have multiple containers assigned to multiple ens*
# interfaces, with multiple "enclaves" connected to different
# vswitches. Each "enclave" vswitch will see the simulator VM
# communicate using its assigned MAC address, but that traffic will
# actually originate from each respective "passed-through" container.

Anyway, once I realized that:

	- a single bridge refuses to forward frames destined to
	  addresses present as "permanent" in its own fdb,

	- snat is only available in POSTROUTING,

	- dnat is only available in PREROUTING,

I decided to add an extra bridge hop and translate <Pub.MAC> back and
forth, to allow the inner container `eth0` to also use it, thus
solving the issue of ARP packets having mismatched "inner" and "outer"
mac addresses for the default gateway :)

If anyone else knows of a way to further "dumb down" a bridge to the
point where it can be convinced to ignore its "permanent" fdb entries
when making a forwarding decision, I can further simplify this setup.

Thanks much,

PS. Figured I'd post my current solution in case anyone else ends up
looking for a neat workaround to a problem similar to mine, assuming
nothing cleaner and simpler becomes known or available :)

[Index of Archives]     [Linux Netfilter Development]     [Linux Kernel Networking Development]     [Netem]     [Berkeley Packet Filter]     [Linux Kernel Development]     [Advanced Routing & Traffice Control]     [Bugtraq]

  Powered by Linux