On Fri, Nov 27, 2020 at 03:10:48PM +0100, Phil Sutter wrote: > On Fri, Nov 27, 2020 at 10:55:11AM +0100, Steffen Klassert wrote: > > On Thu, Nov 26, 2020 at 02:12:00PM +0100, Phil Sutter wrote: > > > > > > > > > > Is this a bug or an expected quirk when using XFRM interface? > > > > > > > > This is expected behaviour. The xfrm interfaces are plaintext devices, > > > > the plaintext packets are routed to the xfrm interface which guarantees > > > > transformation. So the lookup that assigns skb_dst(skb)->xfrm > > > > happens 'behind' the interface. After transformation, > > > > skb_dst(skb)->xfrm will be cleared. So this assignment exists just > > > > inside xfrm in that case. > > > > > > OK, thanks for the clarification. > > > > > > > Does netfilter match against skb_dst(skb)->xfrm? What is the exact case > > > > that does not work? > > > > > > The reported use-case is a match against tunnel data in output hook: > > > > > > | table t { > > > | chain c { > > > | type filter hook output priority filter > > > | oifname eth0 ipsec out ip daddr 192.168.1.2 > > > | } > > > | } > > > > > > The ipsec expression tries to extract that data from skb_dst(skb)->xfrm > > > if present. In xt_policy (for iptables), code is equivalent. The above > > > works when not using xfrm_interface. Initially I assumed one just needs > > > to adjust the oifname match, but even dropping it doesn't help. > > > > Yes, this does not work with xfrm interfaces. As said, they are plaintext > > devices that guarantee transformation. > > > > Maybe you can try to match after transformation by using the secpath, > > but not sure if that is what you need. > > Secpath is used for input only, no? Yes, apparently :-/ There are cases where we have a secpath for output, but you can't rely on it. > I played a bit more with xfrm_interface and noticed that when used, > NF_INET_LOCAL_OUT hook sees the packet (an ICMP reply) only once instead > of twice as without xfrm_interface. I don't think using it should change > behaviour that much apart from packets without matching policy being > dropped. What do you think about the following fix? I checked forwarding > packets as well and it looks like behaviour is identical to plain > policy: > > diff --git a/net/xfrm/xfrm_interface.c b/net/xfrm/xfrm_interface.c > index aa4cdcf69d471..24af61c95b4d4 100644 > --- a/net/xfrm/xfrm_interface.c > +++ b/net/xfrm/xfrm_interface.c > @@ -317,7 +317,8 @@ xfrmi_xmit2(struct sk_buff *skb, struct net_device *dev, struct flowi *fl) > skb_dst_set(skb, dst); > skb->dev = tdev; > > - err = dst_output(xi->net, skb->sk, skb); > + err = NF_HOOK(skb_dst(skb)->ops->family, NF_INET_LOCAL_OUT, xi->net, > + skb->sk, skb, NULL, skb_dst(skb)->dev, dst_output); > if (net_xmit_eval(err) == 0) { > struct pcpu_sw_netstats *tstats = this_cpu_ptr(dev->tstats); I don't mind that change, but we have to be carefull on namespace transition. xi->net is the namespace 'behind' the xfrm interface. I guess this is the namespace where you want to do the match because that is the namespace that has the policies and states for the xfrm interface. So I think that change is correct, I just wanted to point that out explicitely.