Re: [LARTC] Virtual Routers would this work?

"Martin A. Brown" <mabrown-lartc@xxxxxxxxxxxxxx> · Sat, 1 Mar 2003 13:02:22 -0600 (CST)

Hi Matthew,

 :   I need a virtual firewall/router solution.  I'm thinking of a
 : netscreen 1000 but I want to know if it can be done in Linux.
 : Here is my idea:
 : 1 Linux box
 : 2 GigE interfaces

What's linux?

 : 1 interface setup with a public IP address ($PUBIP)
 : 1 interface setup with 802.1q VLAN trunking with 100 vlans assigned
 : ($VLAN1-$VLAN100)
 :
 : a /25 subnet routed to $PUBIP from my core routers
 :
 : All $VLAN interfaces setup with IP 192.168.1.1/24
 : IPs :  I'm sure the kernel will bitch about assigning 192.168.1.1 on a
 : bunch on Interfaces.

That will not be a problem at all.  You can assign the same IP to all the
interfaces--it's confusing for the humans, but not the kernel.  The routes
are the trick....

 : Inbound traffic on $VLAN gets marked with a fwmark ($VLAN1 = fw1,
 : $VLAN2 = fw2)
 : Outbound traffic gets NAT'ed based on the fwmark to an IP in the subnet
 : Returning traffic gets marked based on the dest IP (one of the subnets)
 : with the same fwmark for the appropriate VLAN

Clever use of fwmark.

 : returning packets are 'unNAT'ed' and then routed down the correct VLAN
 : based on the fwmark on the packet.

Something like, this then, right?

 for ID in $( seq 1 100 ) ; do
   iptables -t mangle -A PREROUTING -i $PUBIF -d AAA.BBB.CCC.$ID \
     -j MARK --set-mark $ID
 done

The problem I'd be concerned about (refer to KPTD [1]) would be the
possible interaction/interference with connection tracking.  Perhaps
someone more familiar with the workings of iptables can address this
concern.  Would the connection tracking mechanism circumvent the return
packet traversing the PREROUTING mangle chain?  (Connection tracking
happens first, according to the KPTD....)

 : Questions:
 : How will Linux react if I put 192.168.1.1 on >1 interfaces?

No problem at all!  You'll just have to be smart about choosing output
interface (dev) with your routes.

 : Does the unNAT'ing of the packets destroy the fwmark?

No, but see above concern/question about connection tracking.

 : Is there a way of handling kernel based packets (ICMP, ARP responses)
 : so they go out the correct interface?

Yikes!  Good question on ICMP.  I have no idea about the interaction
between an inbound (already fwmark'd) packet and the generation of ICMP!

 : Example: an ARP (who has 192.168.1.1) from in on VLAN5,  How can I get
 : the kernel to send its response on VLAN5?

The ARP replies will go out the interface on which the query arrived.
You aren't doing anything "funky" with ARP are you?  Just straight up ARP?
No proxy ARPing or anything like that?

 : I see the packet flow as something like.
 :
 : Client (192.168.1.100) sends SYN to www.redhat.com:80
 : Client has default gw of 192.168.1.1
 : Client is on 802.1q VLAN10
 : Client puts packet on Ethernet VLAN10 with MAC address of Linux box
 : Packet enters Linux box on VLAN10 Source:ClientIP Dest:www.redhat.com:80
 : Packet gets marked by iptables rule.  FWMARK = 10
 : Packet gets routed out to upstream gateway
 : Packet gets NAT'ed to SUBNETIP10 based on FWMARK 10
 : Packet now looks like  src: SUBNETIP10:NATPORT  dst:REDHAT:80

Warning!  The fwmark does not survive the local box.  The fwmark feature
is an attribute of the in-memory representation of the packet as it's
handled by the linux router.  As soon as the packet has left the box, the
fwmark datum is lost.

Also, I was under the impression from above that the NAT would happen on
the 2 GigE linux box, not on an upstream router.  Which way would it be?

If two routers, you could use some sort of mangling scheme where you take
advantage of the ToS field to carry this information [2], but you'd then
need to strip it out at the SNATting box.  Public routers might not be
prepared to handle nonstandard data in the ToS field and might
consequently harm your data.

Another approach, assuming a separate upstream router.  I speculate
wildly.....

  - upstream router does all of the connection tracking
  - this router does packet rewriting with iproute2 and uses the mangle
    table only

  outbound (request):
  - this router NATs each vlan$ID-192.168.1.$host to 172.16.$ID.$host
  - transmitted across ethernet
  - upstream router SNATs 172.16.$ID.$host to AAA.BBB.CCC.$ID

  inbound (return):
  - upstream router unSNATs AAA.BBB.CCC.$ID to 172.16.$ID.$host per
    connection tracking mechanism
  - transmitted across ethernet
  - this router MARKs inbound packet with $ID
  - this router NATs 172.16.$ID.$host to 192.168.1.$host
  - RPDB lookup keyed to fwmark only to select routing table
  - routing table specifies output interface (vlan$ID)

But this would be ugly, and probably difficult to debug.  Not to mention
that I've never done it, so it's only a paper solution.

 : Response packet from redhat flows
 : Packet enters Linux box src REDHAT:80 dst SUBNETIP10:NATPORT
 : Packet gets tagged with fwmark based on SUBNETIP to FWMARK 10
 : Packet gets unNAT'ed by kernel NAT table
 : Packet looks like src REDHAT:80 dst CLIENTIP:CLIENTPORT fwmark:10
 : iproute2 setup routes CLIENTIP to the correct client on the correct
 : VLAN (vlan10)
 : arp lookup assigned correct MAC address and sends the packet to the
 : switch on VLAN10

Your description of the outbound packet path leads me to believe that you
have an upstream router.  The description of the inbound packet flow omits
any mention of an upstream router.

 : Problems I can see biting me:

 : ARP tables.  Can the kernel maintain seperate ARP tables for each VLAN?
 :   Each VLAN can have a machine with IP 192.168.1.100

The multiple ARP table question is also one I can't answer.  Maybe
Julian....

Certainly, the neighbor table itself supports entries for IP addresses on
multiple interfaces, so the same IP could be in the neighbor table with
different associations on each interface.  An example:

Imagine a host has two connections to same media segment.  After causing
an ARP lookup on each interface, there are per-device entries in the
neighbor table:

# ping -c 1 -I eth0 10.10.20.33 > /dev/null 2>&1
# ping -c 1 -I eth1 10.10.20.33 > /dev/null 2>&1
# ip neigh show
10.10.20.33 dev eth1 lladdr 00:80:c8:f8:4a:51 nud reachable
10.10.20.33 dev eth0 lladdr 00:80:c8:f8:4a:51 nud reachable

I don't think you'd have any trouble with setting up 100 routing tables
for each 192.168.1.0/24 via its own interface.  I would add the RPDB rules
at a relatively low priority so that other rules could be inserted above.

 for ID in $( seq 1 100 ) ; do
   ip rule add fwmark $ID table $ID prio $( expr 5000 + $ID )
   ip route add 192.168.1.0/24 dev vlan$ID table $ID
 done
 ip route flush cache

 : ICMPs:  What happens when a client tries to ping the linux box
 : (192.168.1.1).  If I fwmark all incoming packets on a VLAN will the
 : kernel respond with a packet using the same fwmark?

I don't know.  Maybe somebody else on the list can answer this one....

 : ARP requests:  Same as the ICMPs.  Will the kernel be able to answer an
 : ARP request to 192.168.1.1

This shouldn't be a problem unless you are doing something very funky with
ARP.

So, in summary

  - I don't think ARP will be a problem for you.  Julian and/or the
    VLAN list might be able to confirm this.
  - You will have to use some sort of NAT on the linux box in order to
    have a way to differentiate the return packets for each VLAN, but this
    you know already.
  - In a one router solution, the trick will probably be the interaction
    between the connection tracking mechanisms and the fwmarking
    mechanism.
  - In a two router solution, the trick will probably be the "second"
    route lookup after the packet has been NATted to 192.168.1.$host.
  - Unanswered question:  ICMP generated by the linux router itself.

Matthew...this is a very interesting question, and I'm quite intrigued by
your approach.  Please let us (the LARTC list) know if you do prove that
this can or cannot be done using the current tools available under linux.
Sadly, the Netscreen may be able to fulfill your need with less effort.

-Martin

 [1]  http://www.docum.org/stef.coene/qos/kptd/
 [2]  http://iptables-tutorial.frozentux.net/chunkyhtml/targets.html#TOSTARGET

-- 
Martin A. Brown --- SecurePipe, Inc. --- mabrown@xxxxxxxxxxxxxx

Re: [LARTC] Virtual Routers would this work?

Linux Advanced Routing and Traffic Control