Re: Traffic shapping and vpn tunnel problems

John McMonagle <johnm@xxxxxxxxxxx> · Fri, 12 Jul 2013 15:41:42 -0500

On Sunday 30 June 2013 8:09:14 AM John A. Sullivan III wrote:
> On Fri, 2013-06-28 at 09:59 -0500, John McMonagle wrote:
> > On Friday, June 28, 2013 04:54:12 am Nicolas Sebrecht wrote:
> > > The 27/06/13, John McMonagle wrote:
> > > > Running traffic shapping both in and out.
> > > > Creating ptp connections via openvpn.
> > > > Route the tunnels with ospf.
> > > > 
> > > > Having a problem with outgoing traffic shapping.
> > > > txqueuelen on the tunnels is normally 100.
> > > > At that setting have horrible latency at times.
> > > > If I lower txqueuelen it keeps latency under control but end up with
> > > > excessive packet loss.
> > > > 
> > > > The more I think about it putting another queue before the traffic
> > > > shapping creates an unsolvable problem.
> > > > I'm tempted to try ipsec and gre tunnels but suspect the same problem
> > > > will be the same.
> > > > 
> > > > How about adding traffic shapping into the tunnels?
> > > > I have 5 tunnels how would one get the tunnel shapping work with the
> > > > shapping on the outgong interface?
> > > > 
> > > > Any suggestions?
> > > 
> > > I have enabled htb with sfq on a router providing 8 openvpn tunnels. I
> > > made it using the "up" option in the configuration file of each VPN. It
> > > allows to load a shell script (hook) once the TUN device is created by
> > > openvpn. The script just apply the QoS on the TUN device of the tunnel.
> > > 
> > > I guess something very similar can be done on the client side if ever
> > > needed.
> > 
> > Nicolas
> > 
> > Not that it's relevant but my tunnels are always up.
> > Can traffic shape but the input to the tunnels need to set a fixed outgoing 
> > bandwidth.
> > I suspect if I set all to  1/2 of the full bandwidth it would help a little.
> > Would be ideal if the tunnel interface could be  traffic shaped on one of the 
> > sub queues of the outgoing interface's traffic shaping.
> > I'm sure I have the some of the terminology messed up ;-(
> > 
> > I noticed that if I create a gre interface there are no transmit buffers.
> > If a gre interface has no buffers maybe that would help?
> > 
> <snip>
> I'm not sure that it solves your problem but here are my notes from how
> we handled it:
> 

John

Have a partial understanding of what you are doing.
If your having minimal packet losses with  txqueuelen at 10 you must be doing something right. 

I'm still a bit confused by using ifb1 on the outgoing interfaces.
Is there anything that explains how the packets are processed through traffic shaping?

> Traffic shaping with VPN presents some challenges.  Some VPN
> technologies such as OpenVPN and KLIPS create virtual interfaces.  The
> traffic from these interfaces must be pooled with the traffic on the
> physical interfaces for traffic shaping.  Moreover, the traffic cannot
> be double counted, e.g., if an OpenVPN packet comes in on eth1 on UDP
> port 1194 and then appears unencrytped as an SSH packet on interface
> tun0, how much bandwidth has that consumed for our HFSC service curve
> calculations.
> There is a similar problem with netkey because the same traffic passes
> through the same interface twice - once unencrypted and then encrypted.
> This also creates a challenge regarding visibility as sometimes the
> unencrypted contents are not visible and thus cannot be classified.  The
> problems are slightly different between egress and ingress traffic
> shaping.
> We ultimately found we could not use the most efficient form of
> classification, CONNMARK, but that is just as well as we cannot use it
> on some devices, e.g., Endians use all the available marks internally
> leaving none available for us.
> Egress VPN Traffic Shaping
> We will use an IFB interface to coalesce the traffic from the various
> interfaces to a single queue.  This implies that we need to create a
> placeholder PRIO QDISC for each interface including the physical
> interface so that we can apply the redirecting filter to the interface.
> We can use a two band queue and send all traffic to the first band from
> which it is redirected into the IFB interface, e.g., 
> tc qdisc replace dev eth1 root handle 2: prio bands 2 priomap 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0
> tc filter replace dev eth1 parent 2:0 protocol ip prio 1 u32 match u8 0
> 0 flowid 2:1 action mirred egress redirect dev ifb1
> We then create an HFSC QDISC on ifb1 with appropriate classes.  The
> challenge is visibility on the IFB interface.  Traffic redirected from
> tun+ (and probably ipsec+ although we did not test that), has not yet
> been encrypted so we could potentially examine the packet.  However, the
> netkey traffic appears to be already encrypted when it reaches the IFB
> interface foiling any tc filter based classification.  We also tried
> using CONNMARK to mark the connection and then restore it for each
> packet.  In fact, this would be our preference as it is the lowest
> overhead solution but it failed.  Perhaps the mark is not preserved when
> the packet is encrypted.  I have asked on the Linux net-dev list but
> have not received a response.  The only thing that worked was the
> iptables CLASSIFY target.  To avoid creating the same rule for every
> different interface, we created a user defined ESHAPE (Egress SHAPE)
> chain and jumped all traffic going out on the physical interface or on
> the virtual interfaces (e.g., tun0) to it, e.g., 
> 
> iptables -t mangle -N ESHAPE
> iptables -t mangle -A POSTROUTING -o eth1 -j ESHAPE
> iptables -t mangle -A POSTROUTING -o tun+ -j ESHAPE
> iptables -t mangle -A ESHAPE -p 6 --sport 82 -j CLASSIFY --set-class 1:10
> iptables -t mangle -A ESHAPE -p 6 --sport 443 -j CLASSIFY --set-class 1:10
> 
> We had some concern that the encapsulated traffic, e.g., ESP or UDP port
> 1194, would be classified into the default HFSC queue and drag down any
> prioritized traffic but this does not appear to be the case.  We tested
> by having only rt curves, setting the default one much lower than the
> prioritized curve, and sending prioritized traffic through the tunnel;
> it all passed at the prioritized rate.
> Ingress VPN Traffic Shaping
> The issues were quite different on ingress. We similarly created an IFB
> interface to coalesce the traffic but visibility was not a problem.  The
> IFB interface saw the unencrypted traffic all the time.  However, we
> could not use the CLASSIFY target since it cannot be used in the mangle
> table PREROUTING chains.  We could not use packet marking since the
> packets arrive on the IFB interface before they have been marked.  Thus,
> the only option was tc filters and generally complicated linked filters
> so that we accommodate the rare case where IP options are used thus
> throwing off the calculation of the TCP packet offsets (because the IP
> header then becomes 24 rather than 20 bytes).
> We also have a problem that the netkey packets are placed in the default
> HFSC queue and can drag down any decrypted priority traffic.  Thus, we
> need to create a separate, high service queue for the encapsulated
> traffic.  This does not appear to be necessary for tun+ traffic and, we
> assume, ipsec+ traffic.  Since ingress traffic shaping works on back
> pressure, a high speed queue for encrypted traffic should not create a
> problem.  In other words, even if we accept encapsulated traffic at a
> higher rate than we want, the outflow of decrypted traffic to the
> internal network is constrained by the rest of the HFSC queue forcing
> packet drops of excessive traffic which should slow down the sending
> stream thus regulating the encrypted packets as well as the decrypted
> packets.
> We thought about using this principle of back pressure to move the
> traffic shaping to the egress of the various internal interfaces but
> that would have still required an IFB interface to coalesce the traffic
> and would have required redirects for every interface.  This, an ingress
> filter seems more efficient.
> Sample test script
> #!/bin/sh
> modprobe ifb
> ifconfig ifb0 up
> ifconfig ifb1 up
> tc qdisc replace dev eth1 root handle 2: prio bands 2 priomap 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> tc qdisc replace dev tun0 root handle 3: prio bands 2 priomap 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> tc qdisc replace dev ifb1 root handle 1 hfsc default 20
> tc class replace dev ifb1 parent 1:0 classid 1:1 hfsc ul rate 100000kbit ls rate 100000kbit

Are these classes at fixed maximum rates?

> tc class replace dev ifb1 parent 1:1 classid 1:20 hfsc rt rate 150kbit #ls rate 40000kbit
> tc class replace dev ifb1 parent 1:1 classid 1:10 hfsc rt rate 500kbit #ls rate 50000kbit
> tc class replace dev ifb1 parent 1:1 classid 1:30 hfsc sc rate 10000kbit
> tc qdisc replace dev ifb1 parent 1:20 handle 1201 sfq
> tc qdisc replace dev ifb1 parent 1:10 handle 1101 sfq
> tc qdisc replace dev ifb1 parent 1:30 handle 1301 sfq
> iptables -t mangle -N ESHAPE
> iptables -t mangle -A POSTROUTING -o eth1 -j ESHAPE
> iptables -t mangle -A POSTROUTING -o tun+ -j ESHAPE
> iptables -t mangle -A ESHAPE -p 6 --sport 82 -j CLASSIFY --set-class 1:10
> iptables -t mangle -A ESHAPE -p 6 --sport 443 -j CLASSIFY --set-class 1:10
> iptables -t mangle -A ESHAPE -p 6 --sport 822 -j CLASSIFY --set-class 1:30
> iptables -t mangle -A ESHAPE -p 6 --dport 822 -j CLASSIFY --set-class 1:30
> iptables -t mangle -A ESHAPE -p 6 --tcp-flags SYN,RST,ACK,FIN ACK -m length --length 20:43 -j CLASSIFY --set-class 1:30
> iptables -t mangle -A ESHAPE -p 6 --sport 53 -j CLASSIFY --set-class 1:30
> iptables -t mangle -A ESHAPE -p 6 --dport 53 -j CLASSIFY --set-class 1:30
> iptables -t mangle -A ESHAPE -p 6 --sport 500 -j CLASSIFY --set-class 1:30
> iptables -t mangle -A ESHAPE -p 6 --dport 500 -j CLASSIFY --set-class 1:30
> iptables -t mangle -A ESHAPE -p 6 --sport 4500 -j CLASSIFY --set-class 1:30
> iptables -t mangle -A ESHAPE -p 6 --dport 4500 -j CLASSIFY --set-class 1:30
> 
> tc qdisc replace dev ifb0 root handle 4 hfsc default 20
> tc class replace dev ifb0 parent 4:0 classid 4:1 hfsc ul rate 100000kbit ls rate 100000kbit
> tc class replace dev ifb0 parent 4:1 classid 4:20 hfsc rt rate 150kbit #ls rate 40000kbit
> tc class replace dev ifb0 parent 4:1 classid 4:10 hfsc rt rate 500kbit #ls rate 50000kbit
> tc class replace dev ifb0 parent 4:1 classid 4:30 hfsc sc rate 10000kbit
> tc class replace dev ifb0 parent 4:1 classid 4:40 hfsc rt rate 100000kbit
> tc qdisc replace dev ifb0 parent 4:20 handle 4201 sfq
> tc qdisc replace dev ifb0 parent 4:10 handle 4101 sfq
> tc qdisc replace dev ifb0 parent 4:30 handle 4301 sfq
> tc qdisc replace dev ifb0 parent 4:40 handle 4401 sfq
> tc filter replace dev ifb0 parent 4:0 protocol ip prio 1 u32 match ip protocol 50 0xff flowid 4:40
> tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 handle 16: u32 divisor 1
> tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 match ip protocol 6 0xff link 16: offset at 0 mask 0x0f00 shift 6 plus 0
> tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 ht 16:0 match tcp dst 822 0xffff at nexthdr+2 flowid 4:30
> tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 ht 16:0 match tcp src 822 0xffff at nexthdr+0 flowid 4:30
> # Send packets <64 bytes (u16 0 0xffc0 at 2) with only the ACK flag set (match u8 16 0xff at nexthdr+13) to the low latency queue
> tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 ht 16:0 match u16 0 0xffc0 at 2 match u8 16 0xff at nexthdr+13 flowid 4:30
> tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 ht 16:0 match tcp src 443 0xffff at nexthdr+0 flowid 4:10
> tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 ht 16:0 match tcp src 82 0xffff at nexthdr+0 flowid 4:10
> tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 handle 117: u32 divisor 1
> tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 match ip protocol 17 0xff link 117: offset at 0 mask 0x0f00 shift 6 plus 0
> tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 ht 117:0 match udp dst 53 0xffff at nexthdr+2 flowid 4:30
> tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 ht 117:0 match udp src 53 0xffff at nexthdr+0 flowid 4:30
> tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 ht 117:0 match udp dst 500 0xffff at nexthdr+2 flowid 4:30
> tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 ht 117:0 match udp src 500 0xffff at nexthdr+0 flowid 4:30
> tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 ht 117:0 match udp dst 4500 0xffff at nexthdr+2 flowid 4:30
> tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 ht 117:0 match udp src 4500 0xffff at nexthdr+0 flowid 4:30
> 
> ip link set eth1 txqueuelen 10
> ip link set tun0 txqueuelen 10

Are the ethertool statements part of the shaping or a hardware fix?
> ethtool -K eth1 gso off gro off
> ethtool -K eth0 gso off gro off
> ethtool -K eth2 gso off gro off

In the next 2 lines is this where the other traffic that was not handled by the ESHAPE chain get gets put into ifb1 ?

> 
> tc filter replace dev eth1 parent 2:0 protocol ip prio 1 u32 match u8 0 0 flowid 2:1 action mirred egress redirect dev ifb1
> tc filter replace dev tun0 parent 3:0 protocol ip prio 1 u32 match u8 0 0 flowid 3:1 action mirred egress redirect dev ifb1
> tc qdisc replace dev eth1 ingress
> tc filter replace dev eth1 parent ffff: protocol ip prio 1 u32 match u8 0 0 action mirred egress redirect dev ifb0
> tc qdisc replace dev tun0 ingress
> tc filter replace dev tun0 parent ffff: protocol ip prio 1 u32 match u8 0 0 action mirred egress redirect dev ifb0
> 
> Note that we have prioritized the IKE packets used to manage the IPSec
> connections.
> 
Thanks for the information so far.

John
--
To unsubscribe from this list: send the line "unsubscribe lartc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Traffic shapping and vpn tunnel problems

Linux Advanced Routing and Traffic Control