RE: How to debug iptables/conntrack?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> From: Florian Westphal <fw@xxxxxxxxx>
> Sent: Friday, May 10, 2019 3:25 AM
> Dexuan Cui <decui@xxxxxxxxxxxxx> wrote:
> > We're trying to figure out how a Docker NAT bridge occasionally sends out an
> > undesired TCP RST packet, which aborts the TCP connection unexpectedly.
>
> conntrack never sends tcp reset packets.

Hi Florian,
Thanks for the detailed explanation!

Here I'm using SNAT (Source NAT), not DNAT (Destination NAT).

Now I suspect conntrack may lose the NAT tracking entry due to some reason,
so the remote server C's reply is deemed to be delivered to the B's network
stack itself rather than the internal docker container instance A. I think this
can explain the strange TCP RST packet, because B itself doesn't have such an
established TCP connection with the sport/dport with C.

So far, I can not imagine how conntrack can lose the entry, though.
I'm trying to use "conntrack -E -p tcp" to figure out if this is the case or not.

> > A: (the docker instance): 172.17.0.2
> > B: (the bridge): 10.35.4.56
> > C: (the remote server): 40.121.XX.YY.
> >
> > 1) A sends a TCP packet, through B, to C;
> >
> > 2) C's reply reaches to B;
> >
> > 3) B immediately sends out a TCP RST packet to C;
> >
> > 4) A thinks C doesn't receive the packet, so A re-transmits the packet 7
> >    times, through B; B still does the normal NAT translation, and forwards
> >    all the 7 packets to C; there is no response from C (I suppose C ignores
> >    the packets);
>
> This would imply that the conntrack entry is still in place.

Maybe not, and as I described above, actually I suspect the entry is no longer
there due to some reason.

Here the nf_conntrack_tcp_loose parameter is the default "enabled", meaning:
for an established NAT'd TCP connection from A, through B, to C, after we
manually delete the NAT tracking entry from conntrack by "conntrack -D",
A is still able to normally send data to C, because conntrack re-creates the
tracking entry when seeing the first packet from A to C (after we delete
the entry).

> > 5) A closes the connection by sending a TCP FIN packet; B still does the
> >    normal NAT translation, and forwards the packet to C; there is no
> >    response from C.
> >
> > We need to figure out what happens in step 3. It looks the bridge thinks
> > something bad happened so it tries to abort the TCP connection?
>
> It looks like the packet is pushed up to the ip stack and is routed to
> localhost, so it ends on bridge input path rather than entering the bridge
> forward path.

Yes, I also suspect so, as I described above.

> The only other explanation is that the iptables ruleset makes use of
> 'REJECT --reject-with tcp-reset' and that triggers for some reason.

I'm going to dump all the iptables rules by "iptables -L" and check them.

FWIW, actually, B is a customer VM and I can not access it... I'm asking the
customer to share the info I hope to check.

> > There are not a lot of concurrent TCP connections: usually there are only
> > about 5 concurrent TCP connections, so I don't think the conntrack module
> > runs out of the tracking table entries. We have checked "conntrack -L" and
> > there are only about 700 entries.
>
> This would have other symptoms;,we don't blindly zap existing assured
> entries.

I agree.

> > Can you please recommend some tools that can trace how exactly the TCP
> > packet flow is processed by iptables/conntrack, especially in the case
> > of NAT?
>
> Are we talking about DNAT or SNAT?
> (I'd guess its SNAT/MASQUERADE), so NAT should not even have any effect
> wrt. forwarding decision.

It's SNAT.

> > Now I'm studying some tools like ipset, nft and ulogd2.
> > It looks we're able to log some iptables/conntrack events when tracing
> > the packet flows, but I'm unsure if we're able to log the event of
> > the undesired TCP Reset packet here.
>
> Normally I'd suggest the TRACE target, however it generates a lot of log
> messages.
>
> I'd suggest to add
> iptables -I INPUT 1 -p tcp -s 172.17.0.2 -d 40.121.XX.YY -j LOG
> iptables -I OUTPUT 1 -p tcp --tcp-flags RST RST -d 172.17.02 -j LOG
>
> and see if that triggers.

Thank you for the helpful pointers!

I'm also reading this link:
https://home.regit.org/2014/02/logging-connection-tracking-event-with-ulogd/
It looks ulogd can be also very helpful to trace a NAT'd TCP flow.

> Based on the description it should not, as 172.17.0.2 -> 40.121.XX.YY packets
> are supposed to be forwarded by the bridge.

Let me futher debug this oddity...

Thanks,
-- Dexuan




[Index of Archives]     [Linux Netfilter Development]     [Linux Kernel Networking Development]     [Netem]     [Berkeley Packet Filter]     [Linux Kernel Development]     [Advanced Routing & Traffice Control]     [Bugtraq]

  Powered by Linux