Re: NETMAP nat target and strange traceroutes

Nick Taylor <nickt@xxxxxxxxxxxxx> · Wed, 30 Jun 2004 12:02:59 -0400 (EDT)

Thank you for responding.  I don't think that we've communicated the
problem clearly, though.  You state that the ICMP's going PAST the
firewall are handled properly, but that's exactly the oposite of the
situation.  UDP packet hits the firewall, TTL is expired, it generates an
ICMP with it's own source address (no nat involved) and this is CORRECT!
the next UDP has TTL large enough to pass, and so it does, then the router
next in line generates ICMP ttl expired, and puts it's IP in the source
address, but then, as that ICMP packet travels back out the firewall into
the real world, the source address gets changed to the UDP packet's
original destination address, so you see the ICMP generated by the
firewall is handled correctly, and the ICMP's passing through the firewall
generated by routers behind it are the ones getting natted wrong.  What's
worse, ONE of the routers behind the firewall manages to get it's real
source address through, check out hop 6.  In fairness, there IS a nat rule
for the address of bond0rp (the machine listed in hop 6), which presumably
is used correctly, whereas there are NOT nat rules for the 10.42.0.0
addresses which could be seen in the tcpdump.  I don't expect to be able
to route to 10.42.0.0/16, of course, but I'd like to see the real source
address of those packets, since there is no NAT in effect for them.

Maybe I just don't understand?

> That's due to an unresolved minor problem in netfilter, which is hard
> (or impossible) to fix without modifying the underlying IP stack. (You can
> find long threads on it in the netfilter-devel archive).
>
> The traceroute probe packets are specifically targeted UDP packets with
> increasing TTL field values. When such a packet with TTL=1 reaches bond0,
> the appropriate conntrack entry is prepared, NAT rule is attached to it -
> and then the packet is dropped by the kernel, without leaving the stack.
> Consequently the conntrack entry is destroyed, without adding it to the
> conntrack hash table. Thus when the ICMP reply packet is sent by the
> stack, conntrack cannot find the conntrack entry (with the NAT rule) to
> which it really belongs to and treats it as a normal packet by looking up
> a matching NAT rule, etc, etc.
>
> Please note, the ICMP packets replying the traceroute packets going past
> the firewall are handled properly. Just the traceroute packets hitting the
> firewall itself, which are not.
>
> Either accept it and ignore the problem or create special rules to handle
> those packets.

> > T3
> > ---------> bond0 ---------> bond0rp ------> rp2 ----> customer
> >        eth0    eth1      eth0     eth1
> >
> > bond0 has the following rules as a sample:
> >
> > iptables -t nat -A PREROUTING -d 216.7.11.208/28 -j NETMAP --to 10.23.18.0/28
> > iptables -t nat -A POSTROUTING -s 10.23.18.0/28 -j NETMAP --to 216.7.11.208/28
> >
> > and bond0rp has the following for a complementary ruleset:
> >
> > iptables -t nat -A POSTROUTING -s 216.7.11.208/28 -j NETMAP --to 10.23.18.0/28
> > iptables -t nat -A PREROUTING -d 10.23.18.0/28 -j NETMAP --to 216.7.11.208/28
> >
> > Now, this seems to work.  I can ping the machines behind my kludge, and I
> > can pass data back and forth, at least for ftp and http.  However, if I do
> > a traceroute from a machine which lives near bond0, I get the following
> > very strange output:
> >
> > redhat/root: traceroute -n 216.7.11.209
> > traceroute to 216.7.11.209 (216.7.11.209), 30 hops max, 40 byte packets
> >  1  205.232.34.3  2.002 ms  1.62 ms  2.027 ms
> >  2  207.127.235.1  3.679 ms  1.85 ms  3.426 ms
> >  3  207.127.235.40  4.402 ms  5.382 ms  3.27 ms
> >  4  216.7.11.209  3.616 ms  2.978 ms  13.848 ms
> >  5  216.7.11.209  5.368 ms  23.634 ms  11.483 ms
> >  6  207.127.233.33  34.556 ms  29.082 ms  20.548 ms
> >  7  216.7.11.209  6.244 ms  6.158 ms  5.818 ms
> >  8  216.7.11.209  8.091 ms *  9.082 ms
> >
> >
> > ?!?!
> >
> > hops 4, 5, and 7 are driving me crazy!  I can only guess that the
> > connection tracking is grabing hold of my nat, and somehow reverse mapping
> > "automatically", but I can't figure out what I did wrong to deserve
> > exactly that behaviour...
>
>