In v3.9 6fd6ce2056de2709 ("ipv6: Do not depend on rt->n in ip6_finish_output2()") changed the behaviour of ip6_finish_output2() such that it creates and uses a neigh entry if none is found. Subsequently the 'n' field was removed from struct rt6_info. Unfortunately my analysis is that in the case of IPVS direct routing this change leads to incorrect behaviour as in this case packets may be output to a destination other than where they would be output according to the route table. In particular, the destination address may actually be a local address and empirically a neighbour lookup seems to result in it becoming unreachable. This patch resolves the problem by providing the destination address determined by IPVS to ip6_finish_output2() in the skb callback. Although this seems to work I can see several problems with this approach: * It is rather ugly, stuffing an IPVS exception right in the middle of IPv6 code. The overhead could be eliminated for many users by using a staic key. But none the less it is not attractive. * The use of the skb callback is may not be valid as it crosses from IPVS to IPv6 code. A possible, though unpleasant, alternative is to add a new field to struct sk_buff. * This covers all IPv6 packets output by IPVS but actually only those output using IPVS Direct-Routing need this. One way to resolve this would be to add a more fine-grained ipvs_property to struct sk_buff. Reported-by: Mark Brooks <mark@xxxxxxxxxxxxxxxx> Signed-off-by: Simon Horman <horms@xxxxxxxxxxxx> --- include/net/ip_vs.h | 6 ++++++ net/ipv6/ip6_output.c | 9 +++++++-- net/netfilter/ipvs/ip_vs_xmit.c | 2 ++ 3 files changed, 15 insertions(+), 2 deletions(-) diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h index 1c2e1b9..11d90a6 100644 --- a/include/net/ip_vs.h +++ b/include/net/ip_vs.h @@ -1649,4 +1649,10 @@ ip_vs_dest_conn_overhead(struct ip_vs_dest *dest) atomic_read(&dest->inactconns); } +struct ipvs_skb_cb { + struct in6_addr *daddr; +}; + +#define IP_VS_SKB_CB(skb) ((struct ipvs_skb_cb *)&(skb)->cb) + #endif /* _NET_IP_VS_H */ diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index a54c45c..a340180 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -52,6 +52,7 @@ #include <net/addrconf.h> #include <net/rawv6.h> #include <net/icmp.h> +#include <net/ip_vs.h> #include <net/xfrm.h> #include <net/checksum.h> #include <linux/mroute6.h> @@ -61,7 +62,7 @@ static int ip6_finish_output2(struct sk_buff *skb) struct dst_entry *dst = skb_dst(skb); struct net_device *dev = dst->dev; struct neighbour *neigh; - struct in6_addr *nexthop; + struct in6_addr *nexthop, *daddr; int ret; skb->protocol = htons(ETH_P_IPV6); @@ -105,7 +106,11 @@ static int ip6_finish_output2(struct sk_buff *skb) } rcu_read_lock_bh(); - nexthop = rt6_nexthop((struct rt6_info *)dst, &ipv6_hdr(skb)->daddr); + if (unlikely(IS_ENABLED(CONFIG_IP_VS) && skb->ipvs_property)) + daddr = IP_VS_SKB_CB(skb)->daddr; + else + daddr = &ipv6_hdr(skb)->daddr; + nexthop = rt6_nexthop((struct rt6_info *)dst, daddr); neigh = __ipv6_neigh_lookup_noref(dst->dev, nexthop); if (unlikely(!neigh)) neigh = __neigh_create(&nd_tbl, nexthop, dst->dev, false); diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c index c47444e..054b679 100644 --- a/net/netfilter/ipvs/ip_vs_xmit.c +++ b/net/netfilter/ipvs/ip_vs_xmit.c @@ -391,6 +391,8 @@ __ip_vs_get_out_rt_v6(struct sk_buff *skb, struct ip_vs_dest *dest, rt = (struct rt6_info *) dst; } + IP_VS_SKB_CB(skb)->daddr = daddr; + local = __ip_vs_is_local_route6(rt); if (!((local ? IP_VS_RT_MODE_LOCAL : IP_VS_RT_MODE_NON_LOCAL) & rt_mode)) { -- 1.8.4 -- To unsubscribe from this list: send the line "unsubscribe lvs-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html