Thank you for your detailed and informative comments. I will address them and resend in a followup patch (as agreed to below)... On Sat, 2016-08-13 at 14:40 +0300, Julian Anastasov wrote: > Hello, > > On Tue, 9 Aug 2016, Dwip N. Banerjee wrote: > > > We decrement the IP ttl in all the modes in order to prevent infinite > > route loops. The changes were done based on Julian Anastasov's > > suggestions in a prior thread. > > > > The (ttl <= 1) check/discard and the actual decrement are done in > > __ip_vs_get_out_rt() and in __ip_vs_get_out_rt_v6(), for the IPv6 > > case. Because of the ttl change, the skb_make_writable() guard is > > also invoked therein. > > > > The !ip_vs_iph_icmp(ipvsh) checks are removed from > > ensure_mtu_is_adequate() as they seem unnecessary (icmp code doesn't > > send ICMP error in response to another ICMP error). > > > > Signed-off-by: Dwip Banerjee <dwip@xxxxxxxxxxxxxxxxxx> > > --- > > diff --git a/net/netfilter/ipvs/ip_vs_xmit.c > > b/net/netfilter/ipvs/ip_vs_xmit.c > > index 01d3d89..e3586bd 100644 > > --- a/net/netfilter/ipvs/ip_vs_xmit.c > > +++ b/net/netfilter/ipvs/ip_vs_xmit.c > > @@ -225,7 +225,7 @@ static inline bool ensure_mtu_is_adequate(struct > > netns_ipvs *ipvs, int skb_af, > > Note that your patch is corrupted here. May be the > email client wraps long lines. I guess Documentation/email-clients.txt > has instructions for your email client how to send patches > without modifying them. You can also test it yourself before > sending PATCHv2: > > scripts/checkpatch.pl --strict /tmp/ttl1.diff > > or > > patch -p1 --dry-run < /tmp/ttl1.diff > I did extract and test the patch (and run checkpatch.pl) on my system - the email client probably did the corruption. I will apply your suggestions before sending the followup patch. > > if (!skb->dev) > > skb->dev = net->loopback_dev; > > /* only send ICMP too big on first fragment */ > > - if (!ipvsh->fragoffs && !ip_vs_iph_icmp(ipvsh)) > > + if (!ipvsh->fragoffs) > > icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu); > > Better such change not to be part of this patch. > If you want to remove this call it should happen in > separate patch. > No problem... I had included it based on the prior suggestion.. but I agree that it is unrelated and is better done as a separate patch. > > IP_VS_DBG(1, "frag needed for %pI6c\n", > > &ipv6_hdr(skb)->saddr); > > @@ -241,8 +241,7 @@ static inline bool ensure_mtu_is_adequate(struct > > netns_ipvs *ipvs, int skb_af, > > return true; > > > > if (unlikely(ip_hdr(skb)->frag_off & htons(IP_DF) && > > - skb->len > mtu && !skb_is_gso(skb) && > > - !ip_vs_iph_icmp(ipvsh))) { > > + skb->len > mtu && !skb_is_gso(skb))) { > > Ditto > Will take it out... > > icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, > > htonl(mtu)); > > IP_VS_DBG(1, "frag needed for %pI4\n", > > @@ -266,6 +265,7 @@ __ip_vs_get_out_rt(struct netns_ipvs *ipvs, int > > skb_af, struct sk_buff *skb, > > struct rtable *rt; /* Route to the other host */ > > int mtu; > > int local, noref = 1; > > + struct iphdr *iph = ip_hdr(skb); > > > > if (dest) { > > dest_dst = __ip_vs_dst_check(dest); > > @@ -326,6 +326,14 @@ __ip_vs_get_out_rt(struct netns_ipvs *ipvs, int > > skb_af, struct sk_buff *skb, > > return local; > > } > > > > + if (iph->ttl <= 1) { > > + /* Tell the sender its packet died... */ > > + __IP_INC_STATS(dev_net(skb_dst(skb)->dev), > > + IPSTATS_MIB_INHDRERRORS); > > + icmp_send(skb, ICMP_TIME_EXCEEDED, ICMP_EXC_TTL, 0); > > + goto err_put; > > + } > > + > > This place is correct. But there is one problem. > When __ip_vs_get_out_rt is called, it is because dest (the real > server) is v4 but we can actually forward IPv6 datagram as indicated > by skb_af. > > This exception can happen only for TUN where we support v4-in-v6 > and v6-in-v4 tunnels. So, we have to create new function just like > ensure_mtu_is_adequate where we have to work based on skb_af. > We can hide there the ttl check, the skb_make_writable call and > the ttl decrease. skb_af indicates what kind of outer IP header > we have before processing. The new function will include handling > for both skb_af families just like it is done in ensure_mtu_is_adequate. > We will call it from this place. > I will investigate this section more, re-code it and test it. When ready, I will send out the patch... > > if (likely(!(rt_mode & IP_VS_RT_MODE_TUNNEL))) { > > mtu = dst_mtu(&rt->dst); > > } else { > > @@ -349,6 +357,13 @@ __ip_vs_get_out_rt(struct netns_ipvs *ipvs, int > > skb_af, struct sk_buff *skb, > > } else > > skb_dst_set(skb, &rt->dst); > > > > + /* don't propagate ttl change to cloned packets */ > > + if (!skb_make_writable(skb, sizeof(struct iphdr))) > > + goto err_put; > > + > > + /* Decrease ttl */ > > + ip_decrease_ttl(iph); > > + > > For the new function. > > > return local; > > > > err_put: > > @@ -414,6 +429,7 @@ __ip_vs_get_out_rt_v6(struct netns_ipvs *ipvs, int > > skb_af, struct sk_buff *skb, > > struct dst_entry *dst; > > int mtu; > > int local, noref = 1; > > + struct ipv6hdr *hdr = ipv6_hdr(skb); > > > > if (dest) { > > dest_dst = __ip_vs_dst_check(dest); > > @@ -473,6 +489,19 @@ __ip_vs_get_out_rt_v6(struct netns_ipvs *ipvs, int > > skb_af, struct sk_buff *skb, > > return local; > > } > > > > + /* check and decrement ttl */ > > + if (hdr->hop_limit <= 1) { > > + /* Force OUTPUT device used as source address */ > > + if (!dst) > > + dst = skb_dst(skb); > > + skb->dev = dst->dev; > > + icmpv6_send(skb, ICMPV6_TIME_EXCEED, ICMPV6_EXC_HOPLIMIT, 0); > > + __IP6_INC_STATS(net, ip6_dst_idev(dst), > > + IPSTATS_MIB_INHDRERRORS); > > + > > + goto err_put; > > + } > > + > > We will call the same new function here. > > > /* MTU checking */ > > if (likely(!(rt_mode & IP_VS_RT_MODE_TUNNEL))) > > mtu = dst_mtu(&rt->dst); > > @@ -498,6 +527,11 @@ __ip_vs_get_out_rt_v6(struct netns_ipvs *ipvs, int > > skb_af, struct sk_buff *skb, > > } else > > skb_dst_set(skb, &rt->dst); > > > > + /* don't propagate ttl change to cloned packets */ > > + if (!skb_make_writable(skb, sizeof(struct ipv6hdr))) > > + goto err_put; > > + > > + hdr->hop_limit--; > > For the new function. > > > return local; > > > > err_put: > > @@ -739,9 +773,6 @@ ip_vs_nat_xmit(struct sk_buff *skb, struct > > ip_vs_conn *cp, > > } > > > > /* copy-on-write the packet before mangling it */ > > - if (!skb_make_writable(skb, sizeof(struct iphdr))) > > - goto tx_error; > > - > > We have to keep this call because our skb_make_writable > in the new function happens only for local=0 while above call > is still needed for the local=1 case. > You are right .. I had missed the local==1 case.. > > if (skb_cow(skb, rt->dst.dev->hard_header_len)) > > goto tx_error; > > > > @@ -831,9 +862,6 @@ ip_vs_nat_xmit_v6(struct sk_buff *skb, struct > > ip_vs_conn *cp, > > } > > > > /* copy-on-write the packet before mangling it */ > > - if (!skb_make_writable(skb, sizeof(struct ipv6hdr))) > > - goto tx_error; > > - > > Ditto, we have to keep it. Will change... > > > if (skb_cow(skb, rt->dst.dev->hard_header_len)) > > goto tx_error; > > > > @@ -1302,9 +1330,6 @@ ip_vs_icmp_xmit(struct sk_buff *skb, struct > > ip_vs_conn *cp, > > } > > > > /* copy-on-write the packet before mangling it */ > > - if (!skb_make_writable(skb, offset)) > > - goto tx_error; > > - > > May be we have to keep this skb_make_writable in the > case of ICMP because the calling function calculates 'offset' > to include many headers while our new skb_make_writable will > cover only outer IP header. It is needed also for the local=1 > case. > Will restore it... > > if (skb_cow(skb, rt->dst.dev->hard_header_len)) > > goto tx_error; > > > > @@ -1394,9 +1419,6 @@ ip_vs_icmp_xmit_v6(struct sk_buff *skb, struct > > ip_vs_conn *cp, > > } > > > > /* copy-on-write the packet before mangling it */ > > - if (!skb_make_writable(skb, offset)) > > - goto tx_error; > > - > > Ditto, we have to keep it here. > Will restore... > > if (skb_cow(skb, rt->dst.dev->hard_header_len)) > > goto tx_error; > > Regards > > -- > Julian Anastasov <ja@xxxxxx> > Thanks Dwip Banerjee -- To unsubscribe from this list: send the line "unsubscribe lvs-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html