Hi A little background: The 'socket' match is used with the tproxy feature, so a process may bind to and spoof an arbitrary client IP address. In iptables, the socket match is used in PREROUTING to match any traffic addressed to such sockets, we then use that to set a mark on the packet and force it to be routed locally rather than being passed onto the real holder of that IP address. If tproxy AND iptables-controlled policy routing (i.e. set mark in OUTPUT, use that in ip rule) is in use AND the new egress interface has a lower MTU than the original AND the server sent us a SYN-ACK packet with an MSS larger than the new egress interface can transmit, Linux will generate an ICMP fragmentation needed message, but we don't get to process that since socket cannot be used in OUTPUT. The attached patch only makes the changes for IPv4, it looks like IPv6 wants similar changes, but I don't have anything available to easily test that. I would greatly appreciate any feedback on this patch, pointers if anything it does is wrong, or even alternative ways to solve this problem. Thanks -- Daniel Collins Software Developer smoothwall daniel.collins@xxxxxxxxxxxxxx www.smoothwall.com Head Office : 1 John Charles Way, Leeds, LS12 6QA, United Kingdom Tech Office : Eagle Point, Little Park Farm Road, Fareham, PO15 5TD, United Kingdom US Office : 8008 Corporate Center Dr #410, Charlotte, NC 28226, United States Telephone: UK: +44 870-199-9500 US: +1 800-959-3760 Smoothwall Limited is registered in England, Company Number: 4298247 and whose registered address is 1 John Charles Way, Leeds, LS12 6QA United Kingdom.
Author: Harry Mason <harry.mason@xxxxxxxxxxxxxx> Author: Daniel Collins <daniel.collins@xxxxxxxxxxxxxx> Description: Allow use of the 'socket' iptables match in OUTPUT, for the purpose of capturing and rerouting fragmentation-needed messages generated in response to a tproxy socket trying to send messages larger than permitted by the MTU of the egress interface chosen by rerouting. --- a/net/netfilter/xt_socket.c +++ b/net/netfilter/xt_socket.c @@ -115,16 +115,16 @@ xt_socket_get_sock_v4(struct net *net, const u8 protocol, const __be32 saddr, const __be32 daddr, const __be16 sport, const __be16 dport, - const struct net_device *in) + const int ifindex) { switch (protocol) { case IPPROTO_TCP: return __inet_lookup(net, &tcp_hashinfo, saddr, sport, daddr, dport, - in->ifindex); + ifindex); case IPPROTO_UDP: return udp4_lib_lookup(net, saddr, sport, daddr, dport, - in->ifindex); + ifindex); } return NULL; } @@ -183,10 +183,35 @@ } #endif - if (!sk) - sk = xt_socket_get_sock_v4(dev_net(skb->dev), protocol, + /* For input packets, sk is the destination socket, so if it is already + * defined there is no need to search again. + * + * For output packets, sk will be the source socket, but we are + * interested in the destination socket, so force a lookup. This + * supports locally generated ICMP errors for sockets with non-local + * addresses. + */ + if (!par->in || !sk) { + /* Check for sockets in the network namespace associated with + * the packets device, if it has one (i.e. is an incomming packet), + * else use the outgoing device made by the routing decision. + * + * Stolen from net/ipv4/icmp.c + */ + struct net *net = dev_net(skb->dev ?: skb_dst(skb)->dev); + + /* ifindex is used when looking up socket if any sockets are + * bound to a specific interface, we know the device on the + * input side, but resort to ignoring any such sockets on the + * output side. + */ + int ifindex = par->in ? par->in->ifindex : 0; + + sk = xt_socket_get_sock_v4(net, protocol, saddr, daddr, sport, dport, - par->in); + ifindex); + } + if (sk) { bool wildcard; bool transparent = true; @@ -417,7 +442,8 @@ .family = NFPROTO_IPV4, .match = socket_mt4_v0, .hooks = (1 << NF_INET_PRE_ROUTING) | - (1 << NF_INET_LOCAL_IN), + (1 << NF_INET_LOCAL_IN) | + (1 << NF_INET_LOCAL_OUT), .me = THIS_MODULE, }, { @@ -428,7 +454,8 @@ .checkentry = socket_mt_v1_check, .matchsize = sizeof(struct xt_socket_mtinfo1), .hooks = (1 << NF_INET_PRE_ROUTING) | - (1 << NF_INET_LOCAL_IN), + (1 << NF_INET_LOCAL_IN) | + (1 << NF_INET_LOCAL_OUT), .me = THIS_MODULE, }, #ifdef XT_SOCKET_HAVE_IPV6 @@ -452,7 +479,8 @@ .checkentry = socket_mt_v2_check, .matchsize = sizeof(struct xt_socket_mtinfo1), .hooks = (1 << NF_INET_PRE_ROUTING) | - (1 << NF_INET_LOCAL_IN), + (1 << NF_INET_LOCAL_IN) | + (1 << NF_INET_LOCAL_OUT), .me = THIS_MODULE, }, #ifdef XT_SOCKET_HAVE_IPV6