Hey folks, I recently stumbled upon an issue in my iptables setup. After some extensive debugging, I've found that the problem occurs when trying to DNAT (+SNAT) a packet that comes in through a bridge, back into the same bridge port it originated from. The code ultimately responsible for this is the should_deliver function [1], which prevents packets from being delivered back to their originating port (ultimately to prevent bouncing broadcast message, I believe). [1]: https://github.com/torvalds/linux/blob/v3.14/net/bridge/br_forward.c#L30-L36 Another requirement for this issue to occur is the bridge-nf-call-iptables settings, which must be at the default 1 setting. Without that, the packets are passed up through br_pass_frame_up normally. Some more details about my setup: matthijs@grubby:~$ sudo brctl show br0 bridge name bridge id STP enabled interfaces br0 8000.5cff350f105e no eth0 matthijs@grubby:~$ sudo ifconfig br0|grep inet inet addr:192.168.1.175 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::5eff:35ff:fe0f:105e/64 Scope:Link matthijs@grubby:~$ sudo iptables -t nat -L Chain PREROUTING (policy ACCEPT) target prot opt source destination DNAT tcp -- anywhere anywhere tcp dpt:81 to:192.168.1.252 Chain INPUT (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination Chain POSTROUTING (policy ACCEPT) target prot opt source destination SNAT tcp -- anywhere anywhere tcp dpt:81 to:192.168.1.175 When I now create a connection from a host on eth0, to 192.167.1.175:81, that packet gets dropped instead of DNATed and SNATed back through eth0 to 192.168.1.184. Enable hairpin mode on eth0, or disabling bridge-nf-call-iptables makes it work as expected. Now, is this a bug in the kernel? Or should I not be expecting this setup to work? While debugging, I reviewed the code to trace the path the packet takes through the code, and this is what I found (this just from review, I haven't verified with debug output): - The packet comes in br_handle_frame - The frame gets dumped into the NF_BR_PRE_ROUTING netfilter chain (e.g. the bridge / ebtables version, not the ip / iptables one). - The ebtables rules get called - The br_nf_pre_routing hook for NF_BR_PRE_ROUTING gets called. This interrupts (returns NF_STOLEN) the handling of the NF_BR_PRE_ROUTING chain, and calls the NF_INET_PRE_ROUTING chain. - The br_nf_pre_routing_finish finish handler gets called after completing the NF_INET_PRE_ROUTING chain. - This handler resumes the handling of the interrupted NF_BR_PRE_ROUTING chain. However, because it detects that DNAT has happened, it sets the finish handler to br_nf_pre_routing_finish_bridge instead of the regular br_handle_frame_finish finish handler. - br_nf_pre_routing_finish_bridge runs, this skb->dev to the parent bridge and sets the BRNF_BRIDGED_DNAT flag which calls neigh->output(neigh, skb); which presumably resolves to one of the neigh_*output functions, each of which again calls dev_queue_xmit, which should (eventually) call br_dev_xmit. - br_dev_xmit sees the BRNF_BRIDGED_DNAT flag and calls br_nf_pre_routing_finish_bridge_slow instead of actually delivering the packet. - br_nf_pre_routing_finish_bridge_slow sets up the destination MAC address, sets skb->dev back to skb->physindev and calls br_handle_frame_finish. - br_handle_frame_finish calls br_forward. - br_forward calls should_deliver, which returns false when skb->dev != p->dev (and "hairpin mode" is not enabled) causing the packet to be dropped. Some things to note: - Why does the packet get redirected to NF_INET_PRE_ROUTING in br_nf_pre_routing already? Is it important that it happens halfway through the NF_BR_PRE_ROUTING chain? If not, why not do it in br_handle_frame_finish / br_forward when it has actually been established that the packet will be bridged and not routed? - Should should_deliver make an exception for DNAT'ed packets? Or perhaps only block broadcasts. Also see this blogpost for a bit more details about my original setup and debugging process: http://www.stderr.nl/Blog/Software/Linux/BouncingPacketsKernelBug.html Gr. Matthijs
Attachment:
signature.asc
Description: Digital signature