Re: Traffic shapping and vpn tunnel problems

"John A. Sullivan III" <jsullivan@xxxxxxxxxxxxxxxxxxx> · Sun, 30 Jun 2013 09:09:14 -0400

On Fri, 2013-06-28 at 09:59 -0500, John McMonagle wrote:
> On Friday, June 28, 2013 04:54:12 am Nicolas Sebrecht wrote:
> > The 27/06/13, John McMonagle wrote:
> > > Running traffic shapping both in and out.
> > > Creating ptp connections via openvpn.
> > > Route the tunnels with ospf.
> > > 
> > > Having a problem with outgoing traffic shapping.
> > > txqueuelen on the tunnels is normally 100.
> > > At that setting have horrible latency at times.
> > > If I lower txqueuelen it keeps latency under control but end up with
> > > excessive packet loss.
> > > 
> > > The more I think about it putting another queue before the traffic
> > > shapping creates an unsolvable problem.
> > > I'm tempted to try ipsec and gre tunnels but suspect the same problem
> > > will be the same.
> > > 
> > > How about adding traffic shapping into the tunnels?
> > > I have 5 tunnels how would one get the tunnel shapping work with the
> > > shapping on the outgong interface?
> > > 
> > > Any suggestions?
> > 
> > I have enabled htb with sfq on a router providing 8 openvpn tunnels. I
> > made it using the "up" option in the configuration file of each VPN. It
> > allows to load a shell script (hook) once the TUN device is created by
> > openvpn. The script just apply the QoS on the TUN device of the tunnel.
> > 
> > I guess something very similar can be done on the client side if ever
> > needed.
> 
> Nicolas
> 
> Not that it's relevant but my tunnels are always up.
> Can traffic shape but the input to the tunnels need to set a fixed outgoing 
> bandwidth.
> I suspect if I set all to  1/2 of the full bandwidth it would help a little.
> Would be ideal if the tunnel interface could be  traffic shaped on one of the 
> sub queues of the outgoing interface's traffic shaping.
> I'm sure I have the some of the terminology messed up ;-(
> 
> I noticed that if I create a gre interface there are no transmit buffers.
> If a gre interface has no buffers maybe that would help?
> 
<snip>
I'm not sure that it solves your problem but here are my notes from how
we handled it:

Traffic shaping with VPN presents some challenges.  Some VPN
technologies such as OpenVPN and KLIPS create virtual interfaces.  The
traffic from these interfaces must be pooled with the traffic on the
physical interfaces for traffic shaping.  Moreover, the traffic cannot
be double counted, e.g., if an OpenVPN packet comes in on eth1 on UDP
port 1194 and then appears unencrytped as an SSH packet on interface
tun0, how much bandwidth has that consumed for our HFSC service curve
calculations.
There is a similar problem with netkey because the same traffic passes
through the same interface twice - once unencrypted and then encrypted.
This also creates a challenge regarding visibility as sometimes the
unencrypted contents are not visible and thus cannot be classified.  The
problems are slightly different between egress and ingress traffic
shaping.
We ultimately found we could not use the most efficient form of
classification, CONNMARK, but that is just as well as we cannot use it
on some devices, e.g., Endians use all the available marks internally
leaving none available for us.
Egress VPN Traffic Shaping
We will use an IFB interface to coalesce the traffic from the various
interfaces to a single queue.  This implies that we need to create a
placeholder PRIO QDISC for each interface including the physical
interface so that we can apply the redirecting filter to the interface.
We can use a two band queue and send all traffic to the first band from
which it is redirected into the IFB interface, e.g., 
tc qdisc replace dev eth1 root handle 2: prio bands 2 priomap 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
tc filter replace dev eth1 parent 2:0 protocol ip prio 1 u32 match u8 0
0 flowid 2:1 action mirred egress redirect dev ifb1
We then create an HFSC QDISC on ifb1 with appropriate classes.  The
challenge is visibility on the IFB interface.  Traffic redirected from
tun+ (and probably ipsec+ although we did not test that), has not yet
been encrypted so we could potentially examine the packet.  However, the
netkey traffic appears to be already encrypted when it reaches the IFB
interface foiling any tc filter based classification.  We also tried
using CONNMARK to mark the connection and then restore it for each
packet.  In fact, this would be our preference as it is the lowest
overhead solution but it failed.  Perhaps the mark is not preserved when
the packet is encrypted.  I have asked on the Linux net-dev list but
have not received a response.  The only thing that worked was the
iptables CLASSIFY target.  To avoid creating the same rule for every
different interface, we created a user defined ESHAPE (Egress SHAPE)
chain and jumped all traffic going out on the physical interface or on
the virtual interfaces (e.g., tun0) to it, e.g., 

iptables -t mangle -N ESHAPE
iptables -t mangle -A POSTROUTING -o eth1 -j ESHAPE
iptables -t mangle -A POSTROUTING -o tun+ -j ESHAPE
iptables -t mangle -A ESHAPE -p 6 --sport 82 -j CLASSIFY --set-class 1:10
iptables -t mangle -A ESHAPE -p 6 --sport 443 -j CLASSIFY --set-class 1:10

We had some concern that the encapsulated traffic, e.g., ESP or UDP port
1194, would be classified into the default HFSC queue and drag down any
prioritized traffic but this does not appear to be the case.  We tested
by having only rt curves, setting the default one much lower than the
prioritized curve, and sending prioritized traffic through the tunnel;
it all passed at the prioritized rate.
Ingress VPN Traffic Shaping
The issues were quite different on ingress. We similarly created an IFB
interface to coalesce the traffic but visibility was not a problem.  The
IFB interface saw the unencrypted traffic all the time.  However, we
could not use the CLASSIFY target since it cannot be used in the mangle
table PREROUTING chains.  We could not use packet marking since the
packets arrive on the IFB interface before they have been marked.  Thus,
the only option was tc filters and generally complicated linked filters
so that we accommodate the rare case where IP options are used thus
throwing off the calculation of the TCP packet offsets (because the IP
header then becomes 24 rather than 20 bytes).
We also have a problem that the netkey packets are placed in the default
HFSC queue and can drag down any decrypted priority traffic.  Thus, we
need to create a separate, high service queue for the encapsulated
traffic.  This does not appear to be necessary for tun+ traffic and, we
assume, ipsec+ traffic.  Since ingress traffic shaping works on back
pressure, a high speed queue for encrypted traffic should not create a
problem.  In other words, even if we accept encapsulated traffic at a
higher rate than we want, the outflow of decrypted traffic to the
internal network is constrained by the rest of the HFSC queue forcing
packet drops of excessive traffic which should slow down the sending
stream thus regulating the encrypted packets as well as the decrypted
packets.
We thought about using this principle of back pressure to move the
traffic shaping to the egress of the various internal interfaces but
that would have still required an IFB interface to coalesce the traffic
and would have required redirects for every interface.  This, an ingress
filter seems more efficient.
Sample test script
#!/bin/sh
modprobe ifb
ifconfig ifb0 up
ifconfig ifb1 up
tc qdisc replace dev eth1 root handle 2: prio bands 2 priomap 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
tc qdisc replace dev tun0 root handle 3: prio bands 2 priomap 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
tc qdisc replace dev ifb1 root handle 1 hfsc default 20
tc class replace dev ifb1 parent 1:0 classid 1:1 hfsc ul rate 100000kbit ls rate 100000kbit
tc class replace dev ifb1 parent 1:1 classid 1:20 hfsc rt rate 150kbit #ls rate 40000kbit
tc class replace dev ifb1 parent 1:1 classid 1:10 hfsc rt rate 500kbit #ls rate 50000kbit
tc class replace dev ifb1 parent 1:1 classid 1:30 hfsc sc rate 10000kbit
tc qdisc replace dev ifb1 parent 1:20 handle 1201 sfq
tc qdisc replace dev ifb1 parent 1:10 handle 1101 sfq
tc qdisc replace dev ifb1 parent 1:30 handle 1301 sfq
iptables -t mangle -N ESHAPE
iptables -t mangle -A POSTROUTING -o eth1 -j ESHAPE
iptables -t mangle -A POSTROUTING -o tun+ -j ESHAPE
iptables -t mangle -A ESHAPE -p 6 --sport 82 -j CLASSIFY --set-class 1:10
iptables -t mangle -A ESHAPE -p 6 --sport 443 -j CLASSIFY --set-class 1:10
iptables -t mangle -A ESHAPE -p 6 --sport 822 -j CLASSIFY --set-class 1:30
iptables -t mangle -A ESHAPE -p 6 --dport 822 -j CLASSIFY --set-class 1:30
iptables -t mangle -A ESHAPE -p 6 --tcp-flags SYN,RST,ACK,FIN ACK -m length --length 20:43 -j CLASSIFY --set-class 1:30
iptables -t mangle -A ESHAPE -p 6 --sport 53 -j CLASSIFY --set-class 1:30
iptables -t mangle -A ESHAPE -p 6 --dport 53 -j CLASSIFY --set-class 1:30
iptables -t mangle -A ESHAPE -p 6 --sport 500 -j CLASSIFY --set-class 1:30
iptables -t mangle -A ESHAPE -p 6 --dport 500 -j CLASSIFY --set-class 1:30
iptables -t mangle -A ESHAPE -p 6 --sport 4500 -j CLASSIFY --set-class 1:30
iptables -t mangle -A ESHAPE -p 6 --dport 4500 -j CLASSIFY --set-class 1:30

tc qdisc replace dev ifb0 root handle 4 hfsc default 20
tc class replace dev ifb0 parent 4:0 classid 4:1 hfsc ul rate 100000kbit ls rate 100000kbit
tc class replace dev ifb0 parent 4:1 classid 4:20 hfsc rt rate 150kbit #ls rate 40000kbit
tc class replace dev ifb0 parent 4:1 classid 4:10 hfsc rt rate 500kbit #ls rate 50000kbit
tc class replace dev ifb0 parent 4:1 classid 4:30 hfsc sc rate 10000kbit
tc class replace dev ifb0 parent 4:1 classid 4:40 hfsc rt rate 100000kbit
tc qdisc replace dev ifb0 parent 4:20 handle 4201 sfq
tc qdisc replace dev ifb0 parent 4:10 handle 4101 sfq
tc qdisc replace dev ifb0 parent 4:30 handle 4301 sfq
tc qdisc replace dev ifb0 parent 4:40 handle 4401 sfq
tc filter replace dev ifb0 parent 4:0 protocol ip prio 1 u32 match ip protocol 50 0xff flowid 4:40
tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 handle 16: u32 divisor 1
tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 match ip protocol 6 0xff link 16: offset at 0 mask 0x0f00 shift 6 plus 0
tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 ht 16:0 match tcp dst 822 0xffff at nexthdr+2 flowid 4:30
tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 ht 16:0 match tcp src 822 0xffff at nexthdr+0 flowid 4:30
# Send packets <64 bytes (u16 0 0xffc0 at 2) with only the ACK flag set (match u8 16 0xff at nexthdr+13) to the low latency queue
tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 ht 16:0 match u16 0 0xffc0 at 2 match u8 16 0xff at nexthdr+13 flowid 4:30
tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 ht 16:0 match tcp src 443 0xffff at nexthdr+0 flowid 4:10
tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 ht 16:0 match tcp src 82 0xffff at nexthdr+0 flowid 4:10
tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 handle 117: u32 divisor 1
tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 match ip protocol 17 0xff link 117: offset at 0 mask 0x0f00 shift 6 plus 0
tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 ht 117:0 match udp dst 53 0xffff at nexthdr+2 flowid 4:30
tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 ht 117:0 match udp src 53 0xffff at nexthdr+0 flowid 4:30
tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 ht 117:0 match udp dst 500 0xffff at nexthdr+2 flowid 4:30
tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 ht 117:0 match udp src 500 0xffff at nexthdr+0 flowid 4:30
tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 ht 117:0 match udp dst 4500 0xffff at nexthdr+2 flowid 4:30
tc filter replace dev ifb0 parent 4:0 protocol ip prio 2 u32 ht 117:0 match udp src 4500 0xffff at nexthdr+0 flowid 4:30

ip link set eth1 txqueuelen 10
ip link set tun0 txqueuelen 10
ethtool -K eth1 gso off gro off
ethtool -K eth0 gso off gro off
ethtool -K eth2 gso off gro off

tc filter replace dev eth1 parent 2:0 protocol ip prio 1 u32 match u8 0 0 flowid 2:1 action mirred egress redirect dev ifb1
tc filter replace dev tun0 parent 3:0 protocol ip prio 1 u32 match u8 0 0 flowid 3:1 action mirred egress redirect dev ifb1
tc qdisc replace dev eth1 ingress
tc filter replace dev eth1 parent ffff: protocol ip prio 1 u32 match u8 0 0 action mirred egress redirect dev ifb0
tc qdisc replace dev tun0 ingress
tc filter replace dev tun0 parent ffff: protocol ip prio 1 u32 match u8 0 0 action mirred egress redirect dev ifb0

Note that we have prioritized the IKE packets used to manage the IPSec
connections.

--
To unsubscribe from this list: send the line "unsubscribe lartc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Traffic shapping and vpn tunnel problems

Linux Advanced Routing and Traffic Control