On Sunday 30 June 2013 8:09:14 AM John A. Sullivan III wrote: > On Fri, 2013-06-28 at 09:59 -0500, John McMonagle wrote: > > On Friday, June 28, 2013 04:54:12 am Nicolas Sebrecht wrote: > > > The 27/06/13, John McMonagle wrote: > > > > Running traffic shapping both in and out. > > > > Creating ptp connections via openvpn. > > > > Route the tunnels with ospf. > > > > > > > > Having a problem with outgoing traffic shapping. > > > > txqueuelen on the tunnels is normally 100. > > > > At that setting have horrible latency at times. > > > > If I lower txqueuelen it keeps latency under control but end up with > > > > excessive packet loss. > > > > > > > > The more I think about it putting another queue before the traffic > > > > shapping creates an unsolvable problem. > > > > I'm tempted to try ipsec and gre tunnels but suspect the same problem > > > > will be the same. > > > > > > > > How about adding traffic shapping into the tunnels? > > > > I have 5 tunnels how would one get the tunnel shapping work with the > > > > shapping on the outgong interface? > > > > > > > > Any suggestions? > > > > > > I have enabled htb with sfq on a router providing 8 openvpn tunnels. I > > > made it using the "up" option in the configuration file of each VPN. It > > > allows to load a shell script (hook) once the TUN device is created by > > > openvpn. The script just apply the QoS on the TUN device of the tunnel. > > > > > > I guess something very similar can be done on the client side if ever > > > needed. > > > > Nicolas > > > > Not that it's relevant but my tunnels are always up. > > Can traffic shape but the input to the tunnels need to set a fixed outgoing > > bandwidth. > > I suspect if I set all to 1/2 of the full bandwidth it would help a little. > > Would be ideal if the tunnel interface could be traffic shaped on one of the > > sub queues of the outgoing interface's traffic shaping. > > I'm sure I have the some of the terminology messed up ;-( > > > > I noticed that if I create a gre interface there are no transmit buffers. > > If a gre interface has no buffers maybe that would help? > > > <snip> > I'm not sure that it solves your problem but here are my notes from how > we handled it: > John Have a partial understanding of what you are doing. If your having minimal packet losses with txqueuelen at 10 you must be doing something right. I'm still a bit confused by using ifb1 on the outgoing interfaces. Is there anything that explains how the packets are processed through traffic shaping? > Traffic shaping with VPN presents some challenges. Some VPN > technologies such as OpenVPN and KLIPS create virtual interfaces. The > traffic from these interfaces must be pooled with the traffic on the > physical interfaces for traffic shaping. Moreover, the traffic cannot > be double counted, e.g., if an OpenVPN packet comes in on eth1 on UDP > port 1194 and then appears unencrytped as an SSH packet on interface > tun0, how much bandwidth has that consumed for our HFSC service curve > calculations. > There is a similar problem with netkey because the same traffic passes > through the same interface twice - once unencrypted and then encrypted. > This also creates a challenge regarding visibility as sometimes the > unencrypted contents are not visible and thus cannot be classified. The > problems are slightly different between egress and ingress traffic > shaping. > We ultimately found we could not use the most efficient form of > classification, CONNMARK, but that is just as well as we cannot use it > on some devices, e.g., Endians use all the available marks internally > leaving none available for us. > Egress VPN Traffic Shaping > We will use an IFB interface to coalesce the traffic from the various > interfaces to a single queue. This implies that we need to create a > placeholder PRIO QDISC for each interface including the physical > interface so that we can apply the redirecting filter to the interface. > We can use a two band queue and send all traffic to the first band from > which it is redirected into the IFB interface, e.g., > tc qdisc replace dev eth1 root handle 2: prio bands 2 priomap 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 > tc filter replace dev eth1 parent 2:0 protocol ip prio 1 u32 match u8 0 > 0 flowid 2:1 action mirred egress redirect dev ifb1 > We then create an HFSC QDISC on ifb1 with appropriate classes. The > challenge is visibility on the IFB interface. Traffic redirected from > tun+ (and probably ipsec+ although we did not test that), has not yet > been encrypted so we could potentially examine the packet. However, the > netkey traffic appears to be already encrypted when it reaches the IFB > interface foiling any tc filter based classification. We also tried > using CONNMARK to mark the connection and then restore it for each > packet. In fact, this would be our preference as it is the lowest > overhead solution but it failed. Perhaps the mark is not preserved when > the packet is encrypted. I have asked on the Linux net-dev list but > have not received a response. The only thing that worked was the > iptables CLASSIFY target. To avoid creating the same rule for every > different interface, we created a user defined ESHAPE (Egress SHAPE) > chain and jumped all traffic going out on the physical interface or on > the virtual interfaces (e.g., tun0) to it, e.g., > > iptables -t mangle -N ESHAPE > iptables -t mangle -A POSTROUTING -o eth1 -j ESHAPE > iptables -t mangle -A POSTROUTING -o tun+ -j ESHAPE > iptables -t mangle -A ESHAPE -p 6 --sport 82 -j CLASSIFY --set-class 1:10 > iptables -t mangle -A ESHAPE -p 6 --sport 443 -j CLASSIFY --set-class 1:10 > > We had some concern that the encapsulated traffic, e.g., ESP or UDP port > 1194, would be classified into the default HFSC queue and drag down any > prioritized traffic but this does not appear to be the case. We tested > by having only rt curves, setting the default one much lower than the > prioritized curve, and sending prioritized traffic through the tunnel; > it all passed at the prioritized rate. > Ingress VPN Traffic Shaping > The issues were quite different on ingress. We similarly created an IFB > interface to coalesce the traffic but visibility was not a problem. The > IFB interface saw the unencrypted traffic all the time. However, we > could not use the CLASSIFY target since it cannot be used in the mangle > table PREROUTING chains. We could not use packet marking since the > packets arrive on the IFB interface before they have been marked. Thus, > the only option was tc filters and generally complicated linked filters > so that we accommodate the rare case where IP options are used thus > throwing off the calculation of the TCP packet offsets (because the IP > header then becomes 24 rather than 20 bytes). > We also have a problem that the netkey packets are placed in the default > HFSC queue and can drag down any decrypted priority traffic. Thus, we > need to create a separate, high service queue for the encapsulated > traffic. This does not appear to be necessary for tun+ traffic and, we > assume, ipsec+ traffic. Since ingress traffic shaping works on back > pressure, a high speed queue for encrypted traffic should not create a > problem. In other words, even if we accept encapsulated traffic at a > higher rate than we want, the outflow of decrypted traffic to the > internal network is constrained by the rest of the HFSC queue forcing > packet drops of excessive traffic which should slow down the sending > stream thus regulating the encrypted packets as well as the decrypted > packets. > We thought about using this principle of back pressure to move the > traffic shaping to the egress of the various internal interfaces but > that would have still required an IFB interface to coalesce the traffic > and would have required redirects for every interface. This, an ingress > filter seems more efficient. > Sample test script > #!/bin/sh > modprobe ifb > ifconfig ifb0 up > ifconfig ifb1 up > tc qdisc replace dev eth1 root handle 2: prio bands 2 priomap 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > tc qdisc replace dev tun0 root handle 3: prio bands 2 priomap 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > tc qdisc replace dev ifb1 root handle 1 hfsc default 20 > tc class replace dev ifb1 parent 1:0 classid 1:1 hfsc ul rate 100000kbit ls rate 100000kbit Are these classes at fixed maximum rates? > tc class replace dev ifb1 parent 1:1 classid 1:20 hfsc rt rate 150kbit #ls rate 40000kbit > tc class replace dev ifb1 parent 1:1 classid 1:10 hfsc rt rate 500kbit #ls rate 50000kbit > tc class replace dev ifb1 parent 1:1 classid 1:30 hfsc sc rate 10000kbit > tc qdisc replace dev ifb1 parent 1:20 handle 1201 sfq > tc qdisc replace dev ifb1 parent 1:10 handle 1101 sfq > tc qdisc replace dev ifb1 parent 1:30 handle 1301 sfq > iptables -t mangle -N ESHAPE > iptables -t mangle -A POSTROUTING -o eth1 -j ESHAPE > iptables -t mangle -A POSTROUTING -o tun+ -j ESHAPE > iptables -t mangle -A ESHAPE -p 6 --sport 82 -j CLASSIFY --set-class 1:10 > iptables -t mangle -A ESHAPE -p 6 --sport 443 -j CLASSIFY --set-class 1:10 > iptables -t mangle -A ESHAPE -p 6 --sport 822 -j CLASSIFY --set-class 1:30 > iptables -t mangle -A ESHAPE -p 6 --dport 822 -j CLASSIFY --set-class 1:30 > iptables -t mangle -A ESHAPE -p 6 --tcp-flags SYN,RST,ACK,FIN ACK -m length --length 20:43 -j CLASSIFY --set-class 1:30 > iptables -t mangle -A ESHAPE -p 6 --sport 53 -j CLASSIFY --set-class 1:30 > iptables -t mangle -A ESHAPE -p 6 --dport 53 -j CLASSIFY --set-class 1:30 > iptables -t mangle -A ESHAPE -p 6 --sport 500 -j CLASSIFY --set-class 1:30 > iptables -t mangle -A ESHAPE -p 6 --dport 500 -j CLASSIFY --set-class 1:30 > iptables -t mangle -A ESHAPE -p 6 --sport 4500 -j CLASSIFY --set-class 1:30 > iptables -t mangle -A ESHAPE -p 6 --dport 4500 -j CLASSIFY --set-class 1:30 > > tc qdisc replace dev ifb0 root handle 4 hfsc default 20 > tc class replace dev ifb0 parent 4:0 classid 4:1 hfsc ul rate 100000kbit ls rate 100000kbit > tc class replace dev ifb0 parent 4:1 classid 4:20 hfsc rt rate 150kbit #ls rate 40000kbit > tc class replace dev ifb0 parent 4:1 classid 4:10 hfsc rt rate 500kbit #ls rate 50000kbit > tc class replace dev ifb0 parent 4:1 classid 4:30 hfsc sc rate 10000kbit > tc class replace dev ifb0 parent 4:1 classid 4:40 hfsc rt rate 100000kbit > tc qdisc replace dev ifb0 parent 4:20 handle 4201 sfq > tc qdisc replace dev ifb0 parent 4:10 handle 4101 sfq > tc qdisc replace dev ifb0 parent 4:30 handle 4301 sfq > tc qdisc replace dev ifb0 parent 4:40 handle 4401 sfq > tc filter replace dev ifb0 parent 4:0 protocol ip prio 1 u32 match ip protocol 50 0xff flowid 4:40 > tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 handle 16: u32 divisor 1 > tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 match ip protocol 6 0xff link 16: offset at 0 mask 0x0f00 shift 6 plus 0 > tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 ht 16:0 match tcp dst 822 0xffff at nexthdr+2 flowid 4:30 > tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 ht 16:0 match tcp src 822 0xffff at nexthdr+0 flowid 4:30 > # Send packets <64 bytes (u16 0 0xffc0 at 2) with only the ACK flag set (match u8 16 0xff at nexthdr+13) to the low latency queue > tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 ht 16:0 match u16 0 0xffc0 at 2 match u8 16 0xff at nexthdr+13 flowid 4:30 > tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 ht 16:0 match tcp src 443 0xffff at nexthdr+0 flowid 4:10 > tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 ht 16:0 match tcp src 82 0xffff at nexthdr+0 flowid 4:10 > tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 handle 117: u32 divisor 1 > tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 match ip protocol 17 0xff link 117: offset at 0 mask 0x0f00 shift 6 plus 0 > tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 ht 117:0 match udp dst 53 0xffff at nexthdr+2 flowid 4:30 > tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 ht 117:0 match udp src 53 0xffff at nexthdr+0 flowid 4:30 > tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 ht 117:0 match udp dst 500 0xffff at nexthdr+2 flowid 4:30 > tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 ht 117:0 match udp src 500 0xffff at nexthdr+0 flowid 4:30 > tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 ht 117:0 match udp dst 4500 0xffff at nexthdr+2 flowid 4:30 > tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 ht 117:0 match udp src 4500 0xffff at nexthdr+0 flowid 4:30 > > ip link set eth1 txqueuelen 10 > ip link set tun0 txqueuelen 10 Are the ethertool statements part of the shaping or a hardware fix? > ethtool -K eth1 gso off gro off > ethtool -K eth0 gso off gro off > ethtool -K eth2 gso off gro off In the next 2 lines is this where the other traffic that was not handled by the ESHAPE chain get gets put into ifb1 ? > > tc filter replace dev eth1 parent 2:0 protocol ip prio 1 u32 match u8 0 0 flowid 2:1 action mirred egress redirect dev ifb1 > tc filter replace dev tun0 parent 3:0 protocol ip prio 1 u32 match u8 0 0 flowid 3:1 action mirred egress redirect dev ifb1 > tc qdisc replace dev eth1 ingress > tc filter replace dev eth1 parent ffff: protocol ip prio 1 u32 match u8 0 0 action mirred egress redirect dev ifb0 > tc qdisc replace dev tun0 ingress > tc filter replace dev tun0 parent ffff: protocol ip prio 1 u32 match u8 0 0 action mirred egress redirect dev ifb0 > > Note that we have prioritized the IKE packets used to manage the IPSec > connections. > Thanks for the information so far. John -- To unsubscribe from this list: send the line "unsubscribe lartc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html