Eric W. Biederman wrote: > Benjamin Thery <benjamin.thery@xxxxxxxx> writes: > > >>My investigations on the increase of cpu load when running netperf inside a >>container (ie. through etun2<->etun1) is progressing slowly. >> >>I'm not sure the cause is fragmentation as we supposed initially. >>In fact, it seems related to forwarding the packets between the devices. >> >>Here is what I've tracked so far: >>* when we run netperf from the container, oprofile reports that the top >>"consuming" symbol is: "pskb_expand_head". Next comes >>"csum_partial_copy_generic". these symbols represents respectively 13.5% and >>9.7% of the samples. >>* Without container, these symbols don't show up in the first 20 entries. >> >>Who is calling "pskb_expand_head" in this case? >> >>Using systemtap, I determined that the call to "pskb_expand_head" comes from the >>skb_cow() in ip_forward() (l.90 in 2.6.20-rc5-netns). >> >>The number of calls to "pskb_expand_head" matches the number of invocations of >>ip_forward() (268000 calls for a 20 seconds netperf session in my case). > > > Ok. This seems to make sense, and is related to how we have configured the > network in this case. > > It looks like pskb_expand_head is called by skb_cow. > > skb_cow has two cases when it calls pskb_expand_head. > - When there are multiple people who have a copy of the packet > (tcpdump and friends) > - When there isn't enough room for the hard header. > > Any chance one of you guys looking into this can instrument up > ip_foward just before the call to skb_cow and find out which > reason it is? > > A cheap trick to make the overhead go away is probably to setup > ethernet bridging in this case... > > But if we can ensure the ip_foward case does not need to do anything > more than modify the ttl and update the destination that would > be good to. > > Anyway this does look very solvable. we have the hack below in ip_forward() to avoid skb_cow(), Banjamin, can you check whether it helps in your case please? (NOTE: you will need to replace check for NETIF_F_VENET with something else or introduce the same flag on etun device). diff -upr linux-2.6.18-rhel5.orig/net/ipv4/ip_forward.c linux-2.6.18-rhel5-028stab023/net/ipv4/ip_forward.c --- linux-2.6.18-rhel5.orig/net/ipv4/ip_forward.c 2006-09-20 07:42:06.000000000 +0400 +++ linux-2.6.18-rhel5-028stab023/net/ipv4/ip_forward.c 2007-03-20 17:22:45.000000000 +0300 @@ -86,6 +86,24 @@ int ip_forward(struct sk_buff *skb) if (opt->is_strictroute && rt->rt_dst != rt->rt_gateway) goto sr_failed; + /* + * We try to optimize forwarding of VE packets: + * do not decrement TTL (and so save skb_cow) + * during forwarding of outgoing pkts from VE. + * For incoming pkts we still do ttl decr, + * since such skb is not cloned and does not require + * actual cow. So, there is at least one place + * in pkts path with mandatory ttl decr, that is + * sufficient to prevent routing loops. + */ + iph = skb->nh.iph; + if ( +#ifdef CONFIG_IP_ROUTE_NAT + (rt->rt_flags & RTCF_NAT) == 0 && /* no NAT mangling expected */ +#endif /* and */ + (skb->dev->features & NETIF_F_VENET)) /* src is VENET device */ + goto no_ttl_decr; + /* We are about to mangle packet. Copy it! */ if (skb_cow(skb, LL_RESERVED_SPACE(rt->u.dst.dev)+rt->u.dst.header_len)) goto drop; @@ -94,6 +112,8 @@ int ip_forward(struct sk_buff *skb) /* Decrease ttl after skb cow done */ ip_decrease_ttl(iph); +no_ttl_decr: + /* * We now generate an ICMP HOST REDIRECT giving the route * we calculated. @@ -121,3 +141,5 @@ drop: _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers