Re: tc question about ingress bandwidth splitting

Philip Prindeville <philipp_subx@xxxxxxxxxxxxxxxxxxxxx> · Tue, 24 Mar 2020 00:51:20 -0600

Hi Grant,

> On Mar 22, 2020, at 4:59 PM, Grant Taylor <gtaylor@xxxxxxxxxxxxxxxxxx> wrote:
> 
> On 3/22/20 3:56 PM, Philip Prindeville wrote:
>> Hi all,
> 
> Hi Philip,
> 
>> The uplink is G.PON 50/10 mbps.
> 
> Aside:  /Gigabit/ PON serving 50 / 10 Mbps.  ~chuckle~

Well, it’s exactly because it *isn’t* 1Gbps each direction that I need good shaping.  I could get more, but I’d also pay more.

> 
>> I’d like to cap the usage on “guest” to 10/2 mbps.  Any unused bandwidth from “guest” goes to “production”.
> 
> Does any of production's unused bandwidth go to guest?  Or is guest hard capped at 10 & 2?

No.  The idea being that “guest” relies on the kindness of strangers… whereas “production” has a guaranteed SLA of at least 40/8 mbps.

> 
>> I thought about marking the traffic coming in off “wan" (the public interface).
> 
> One of the most important lessons that I remember about QoS is that you can only /effectively/ limit what you send.

Right.  In this case I’m limiting (or pacing) the ACKs so that the sender paces his data.

> 
> Read:  You can't limit what is sent down your line to your router.

For UDP not at all.  For TCP you can apply back pressure, as above.  If the sender has filled his window, and I hold back any ACKs, he can’t send anything more until I do send an ACK.

> 
> Further read:  You will receive more down your line than the 10 & 2 that you limit guest to, but you can feed guest at 10 & 2.

Correct.  Eventually the sender will back off in an attempt to reach a congestion-free steady state.

My scenario, as I said, is a SoHo router.  I don’t have a lot of servers behind it that receive bursts of incoming traffic asynchronously from outside (other than email, which I host locally).

If my daughter decides to watch an HD movie on an iPad during the day while I’m working, I don’t want that traffic overrunning my network and causing me to not be able to work.  In that scenario, the connection is originating internally and going outbound, and it’s long-lived (where "long-lived" is any duration of 20 or more RTT’s).

> 
>> Then using HTB to have a 50 mbps cap at the root, and allocating 10mb/s to the child “guest”.  The other sibling would be “production”, and he gets the remaining traffic.
>> Upstream would be the reverse, marking ingress traffic from “guest” with a separate tag.  Allocating upstream root on “wan” with 10 mbps, and the child “guest” getting 2 mbps.  The remainder goes to the sibling “production”.
> 
> It's been 15+ years since I've done much with designing QoS trees.  I'm sure that things have changed since the last time I looked at them.

Only slightly less for me:  I did a traffic-shaper plugin for Arno’s Internet Firewall (AIF) about 12 years ago.  I’ve since forgotten everything.

> 
>> Should be straightforward enough, right? (Well, forwarding is more straightforward than traffic terminating on the router itself, I guess… bonus points for getting that right, too.)
> 
> As they say, the devil is in the details.
> 
> Conceptually, it's simple enough.  The the particulars of the execution is going to take effort.

Yup.  And I’m hoping to be able to not need ifb to do it.

> 
>> I’m hoping that the limiting will work adequately so that the end-to-end path has adequate congestion avoidance happening, and that upstream doesn’t overrun the receiver and cause a lot of packets to be dropped on the last hop (work case of wasted bandwidth).
> 
> (See further read above.)
> 
>> Not sure if I need special accommodations for bursting or if that would just delay the “settling” of congestion avoidance into steady-state.
> 
> Well, if the connection is a hard 50 & 10, there's nothing that can burst over that.

Sure, for the total.  I meant “guest” bursting over his allotted 10/2 mbps for a short duration, say 600ms (I came up with that as being 5 RTT’s of 120ms).  I figure that’s enough for slow-start to ramp up into steady state…

> 
> The last time I dealt with bursting, I found that it was a lot of effort, for minimal return on said effort.  Further, I was able to get quite similar effort by allowing production and guest to use the bandwidth that the other didn't use, which was considerably simpler to set up.

Well, know you’ve got me confused.  Because if each can borrow from the other, where’s the SLA?  Where’s the cap?  Who gets prioritized?

I could be completely unshaped, and have both borrowing from each other… which is the degenerate case.

> 
> The bursting I used in the past was bucket based (I don't remember the exact QoS term) where the bucket filled at the defined rate, and could empty it's contents as fast as it could be taken out.  So if the bucket was 5 gallons, then a burst at line rate up to 5 gallons was possible. Then it became a matter of how big the bucket needed to be, 5 gallons, 55 gallons, 1000 gallons, etc.
> 
> I found that guaranteeing each class a specific amount of bandwidth and allowing the unused bandwidth to be used by other classes simpler and just as effective.

Yeah, and indeed that’s what HTB excels at.

> 
> Read:  Speed of burst, without the complexity and better (more consistent) use of the bandwidth.  Remember, if the bandwidth isn't used, it's gone, wasted, so why not let someone use it?

Agreed.

Although… in the case of the “guest” network, I don’t ever want it performing better than the hard SLA of 10/2 mbps, or people will complain when they don’t get extra bandwidth.  If they’re conditioned to think that “I’m on the guest network, and 10/2 mbps is all I’m going to get” then they’ll be happy with it and won’t complain.

I don’t want to hear, “well, this was so much better two days ago!”

My answer is, “It’s free.  You’re getting it by someone else’s good graces… be grateful you’re getting anything at all.”

> 
>> Also not sure if ECN is worth marking at this point.  Congestion control is supposed to work better than congestion avoidance, right?
> 
> If I could relatively easily mark things with ECN, I would.  But I don't know how valuable ECN really is.  I've not looked in 10+ years, and the last time I did, I didn't find much that was actually utilizing it.

Some ISPs were actually squashing the bits, and got spanked severely by the FCC.

Also, some older router’s IP stacks were not ECN aware, and had the older bit definitions (remember that RFC 3168 and ECN borrowed the ECT1 bit from TOS/LOWCOST from RFC 791 and 1349).

> 
>> Anyone know what the steps would look like to accomplish the above?
> 
> It is going to be highly dependent on what you want to do and what your device is capable of.

I’m assuming a 3.18 kernel or later and iproute2 + iptables.  Nothing else.  And sch_htb is present.

> 
> I have an idea of what I would do if I were to implement this on a standard Linux machine functioning as the router.
> 
> 1st:  Address the fact that you can only effectively rate limit what you send.  So, change the problem so that you rate limit what is sent to your router.  I would do this by having the incoming connection go into a Network Namespace and a new virtual connection to the main part of the router.  This Network Namespace can then easily rate limit what it sends to the main part of the router, on a single interface.

This is the same problem that ifb solves, right?

I’m not sure I want to assume that Namespaces are available in all scenarios.

> 
>             +------------------------+
> (Internet)---+-eth5  router  eth{0,1}-+---(LAN)
>             +------------------------+
> 
>             +--------------------+-------------------------+
> (Internet)---+-eth5  NetNS  veth0=|=veth5  router  eth{0,1}-+---(LAN)
>             +--------------------+-------------------------+
> 
> This has the advantage that the QoS tree in the NetNS only needs to deal with sending on one interface, veth0.
> 
> This has the added advantage that QoS tree won't be applied to traffic between production and guest.  (Or you don't need to make the QoS tree /more/ complex to account for this.)

Yeah, for now I’m not concerned about internal traffic.  Yet.

> 
> 2nd:  Don't worry about bucketing.  Define a minimum that each traffic class is guaranteed to get if it uses it.  Then allow the other traffic class to use what ever bandwidth the first traffic class did not use.

Agreed.

> 
> Why limit guest to 10 Mbps if production is only using 5 Mbps.  That's 35 Mbps of available download that's wasted.

As I said, I don’t want to have to explain to anyone later that “35mbps might have been available Sunday, but today I’m running Carbonite and it’s hogging all the bandwidth while I download these 10 new VM’s I created this morning, so suck it.”

> 
> 3rd:  The nature of things, TCP in particular, is to keep bumping into the ceiling.  So if you artificially lower the ceiling, traffic coming in /will/ go over the limit.  Conversely, the circuit is limited at 50 Mbps inbound.  That limit is enforced by the ISP.  There is no way that the traffic can go over it.

No, but it can cause other traffic destined to the production network to get dropped, which is the scenario I’m trying to avoid.

As I remember, some of the newer (model-based) congestion avoidance algorithms (like BBR) were really much better at fairness and avoiding dropped packets…

> 
>> A bunch of people responded, “yeah, I’ve been wanting to do that too…” when I brought up my question, so if I get a good solution I’ll submit a FAQ entry.
> 
> Cool.
> 
>> Thanks,
> 
> You're welcome.
> 
> Good luck.

Thanks.

-Philip

Re: tc question about ingress bandwidth splitting

Linux Advanced Routing and Traffic Control