On 17.10.2016 22:27, Pablo Neira Ayuso wrote: > On Mon, Oct 17, 2016 at 10:17:28PM +0200, Noel Kuntze wrote: >> > On 17.10.2016 22:11, Pablo Neira Ayuso wrote: >>> > > On Mon, Oct 17, 2016 at 09:52:06PM +0200, Noel Kuntze wrote: >>>>> > >> > On 17.10.2016 21:44, Pablo Neira Ayuso wrote: >>>>>>> > >>> > > On Fri, Sep 09, 2016 at 09:06:59AM +0200, Thomas Bach wrote: >>>>>>>>>>> > >>>>> > >> > Hi, >>>>>>>>>>> > >>>>> > >> > >>>>>>>>>>> > >>>>> > >> > I have two hosts with public ip addresses running Ubuntu 16.04 with >>>>>>>>>>> > >>>>> > >> > Kernel version 4.4.0. >>>>>>>>>>> > >>>>> > >> > >>>>>>>>>>> > >>>>> > >> > I want to interconnect two containers (systemd-nspawn) with veth >>>>>>>>>>> > >>>>> > >> > interfaces running on these hosts in a server client setup. >>>>>>>>>>> > >>>>> > >> > >>>>>>>>>>> > >>>>> > >> > So on the first host, where the server in the container runs I have >>>>>>>>>>> > >>>>> > >> > the following rules: >>>>>>>>>>> > >>>>> > >> > # nft list ruleset >>>>>>>>>>> > >>>>> > >> > table ip nat { >>>>>>>>>>> > >>>>> > >> > chain prerouting { >>>>>>>>>>> > >>>>> > >> > type nat hook prerouting priority 0; policy accept; >>>>>>>>>>> > >>>>> > >> > tcp dport { 4506, 4505} dnat 10.0.0.2 >>>>>>>>>>> > >>>>> > >> > } >>>>>>>>>>> > >>>>> > >> > >>>>>>>>>>> > >>>>> > >> > chain output { >>>>>>>>>>> > >>>>> > >> > type nat hook output priority 0; policy accept; >>>>>>>>>>> > >>>>> > >> > tcp dport { 4505, 4506} dnat 10.0.0.2 >>>>>>>>>>> > >>>>> > >> > } >>>>>>>>>>> > >>>>> > >> > >>>>>>>>>>> > >>>>> > >> > chain input { >>>>>>>>>>> > >>>>> > >> > type nat hook input priority 0; policy accept; >>>>>>>>>>> > >>>>> > >> > } >>>>>>>>>>> > >>>>> > >> > >>>>>>>>>>> > >>>>> > >> > chain postrouting { >>>>>>>>>>> > >>>>> > >> > type nat hook postrouting priority 0; policy accept; >>>>>>>>>>> > >>>>> > >> > ip saddr 10.0.0.0/8 oif enp4s0 masquerade >>>>>>>>>>> > >>>>> > >> > } >>>>>>>>>>> > >>>>> > >> > } >>>>>>>>>>> > >>>>> > >> > >>>>>>>>>>> > >>>>> > >> > On the second host, where the client runs i have the following: >>>>>>>>>>> > >>>>> > >> > # nft list ruleset >>>>>>>>>>> > >>>>> > >> > table ip nat { >>>>>>>>>>> > >>>>> > >> > chain prerouting { >>>>>>>>>>> > >>>>> > >> > type nat hook prerouting priority 0; policy accept; >>>>>>>>>>> > >>>>> > >> > } >>>>>>>>>>> > >>>>> > >> > >>>>>>>>>>> > >>>>> > >> > chain output { >>>>>>>>>>> > >>>>> > >> > type nat hook output priority 0; policy accept; >>>>>>>>>>> > >>>>> > >> > } >>>>>>>>>>> > >>>>> > >> > >>>>>>>>>>> > >>>>> > >> > chain input { >>>>>>>>>>> > >>>>> > >> > type nat hook input priority 0; policy accept; >>>>>>>>>>> > >>>>> > >> > } >>>>>>>>>>> > >>>>> > >> > >>>>>>>>>>> > >>>>> > >> > chain postrouting { >>>>>>>>>>> > >>>>> > >> > type nat hook postrouting priority 0; policy accept; >>>>>>>>>>> > >>>>> > >> > ip saddr 10.0.0.0/8 oif enp0s31f6 masquerade >>>>>>>>>>> > >>>>> > >> > } >>>>>>>>>>> > >>>>> > >> > } >>>>>>>>>>> > >>>>> > >> > >>>>>>>>>>> > >>>>> > >> > This works as expected and without any problems at all. Now IPSec >>>>>>>>>>> > >>>>> > >> > enters the picture. As soon as I setup a policy to encrypt everyting >>>>>>>>>>> > >>>>> > >> > between the two hosts the following happens: >>>>>>>>>>> > >>>>> > >> > + I can still connect from the second host to the server in the >>>>>>>>>>> > >>>>> > >> > container without problems, >>>>>>>>>>> > >>>>> > >> > + I can still /connect/ (i.e. establish a connection) from the >>>>>>>>>>> > >>>>> > >> > container on the second host to the server on the first host, but >>>>>>>>>>> > >>>>> > >> > + in tcpdump listening on the interface of the container (on the >>>>>>>>>>> > >>>>> > >> > second host) I see lots of TCP Retransmissions and the TCP connection >>>>>>>>>>> > >>>>> > >> > is effectively broken. >>>>>>>>>>> > >>>>> > >> > >>>>>>>>>>> > >>>>> > >> > Can someone give me a hint what is going on here? >>>>>>> > >>> > > Did you find the root cause for this problem? >>>>>>> > >>> > > -- >>>>>>> > >>> > > To unsubscribe from this list: send the line "unsubscribe netfilter" in >>>>>>> > >>> > > the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>> > >>> > > More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>> > >>> > > >>>>> > >> > >>>>> > >> > Probably missing TCP MTU clamping. Normal problem. >>>>> > >> > Can happen with broken PMTUD. >>>>> > >> > >>>>> > >> > We also need the policy match module to support ipsec in nftables. >>>>> > >> > Is that on the TODO list? >>> > > >>> > > I know Florian Westphal made a simple extension, he's got a patch in >>> > > his queue. Trimming off most of it, just leaving this small chunk: >>> > > >>> > > diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c >>> > > index 6c1e024..76b70e1 100644 >>> > > --- a/net/netfilter/nft_meta.c >>> > > +++ b/net/netfilter/nft_meta.c >>> > > @@ -190,6 +190,9 @@ void nft_meta_get_eval(const struct nft_expr >>> > > *expr, >>> > > *dest = prandom_u32_state(state); >>> > > break; >>> > > } >>> > > + case NFT_META_SECPATH: >>> > > + *(__u8 *)dest = secpath_exists(skb); >>> > > + break; >>> > > default: >>> > > WARN_ON(1); >>> > > goto err; >>> > > >>> > > Would this be enough for your usecase? >> > >> > No, the problem is that in nftables, we can't tell apart ipsec >> > protected packets from unprotected ones. But we need that, because >> > generally, we want to treat them differently. In iptables we can do >> > that with -m policy [additional args], but there's nothing like that >> > in nftables. We need complete support for all the options of the >> > policy match module in nftables. > Are you using *all* options there? I'd appreciate if you can develop a > bit the usecases where you use these different options. > >> > I don't see what that three line patch actually does. Would you >> > kindly elaborate? > Allowing to match if the packet is protected/unprotected in a > true/false fashion. > > Thanks. Well, I am active in the strongSwan community, so I believe I've seen all the use cases there are and I've seen uses of every option, except "--next" and "--strict". But I think there are probably use cases where they are used as well. --spi, --reqid --tunnel-src, --tunnel-dst, --mode and --proto are used to identify different tunnels, in e.g. a scenario where an IPsec enabled router is part of an IPsec protected LAN with host-to-host transport mode tunnels with ah+esp bundles between the hosts while providing IPsec VPN access from roadwarrior users using tunnel mode that are marked with a particular, unique mark value, as well as site-to-site tunnels using tunnel mode. A userspace component is used to multiplex broadcast and multicast packets from the LAN to roadwarriors, as well as between different roadwarriors by listening for those packets and sending them out with the MARK value that was set on the IPsec SPs. In this scenario --tunnel-src, --tunnel-dst and --mode are used to identify the host-to-host LAN transport mode tunnels. --mode tunnel, --spi and --tunnel-dst and --mode are used to identify the roadwarrior tunnels. --reqid is used to identify particular tunnels, which are configured with a special reqid by the userspace IKE daemon to specially handle certain connections in the firewall configuration. --spi is used to identify several transport mode tunnel endpoints behind a NAT device. The different peers negotiated different SAS and SPs. --spi is used to tell them apart and mark the connections from those clients with different connmark values, to enable conntrack to tell them apart, as well as enable an accounting system, as well as the firewlal on the host to differentiate them. I hope this text was enlightning. :) -- Mit freundlichen Grüßen/Kind Regards, Noel Kuntze GPG Key ID: 0x63EC6658 Fingerprint: 23CA BB60 2146 05E7 7278 6592 3839 298F 63EC 6658
Attachment:
signature.asc
Description: OpenPGP digital signature