Re: tc question about ingress bandwidth splitting

Linux Advanced Routing and Traffic Control

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/24/20 12:51 AM, Philip Prindeville wrote:
Hi Grant,

Hi,

Well, it’s exactly because it *isn’t* 1Gbps each direction that I need good shaping. I could get more, but I’d also pay more.

Fair enough.

No. The idea being that “guest” relies on the kindness of strangers… whereas “production” has a guaranteed SLA of at least 40/8 mbps.

QoS has the ability to guarantee an SLA of 40 & 8 to production.

Think about it this way:

1)  Production gets up to it's SLA.
2)  Guest gets up to it's SLA.
3)  Production and / or guest get any unused bandwidth.

Each class is guaranteed their SLA, and can optionally use any remaining bandwidth (unused bandwidth of other classes).

Right. In this case I’m limiting (or pacing) the ACKs so that the sender paces his data.

That's not what I was referring to.

QoS can rate limit what is sent out the internal interfaces at the 40 / 10 Mbps values.

The thing that it can not do is rate limit want comes in the outside interface. There may be ~12 Mbps of incoming traffic for guests. But the router will only send 10 Mbps of that out it's inside interface. Thus the router is rate limiting what guest receives to 10 Mbps. It's just that there is an additional 2 Mbps that the router is dropping on the floor.

Does that make sense?

For UDP not at all. For TCP you can apply back pressure, as above. If the sender has filled his window, and I hold back any ACKs, he can’t send anything more until I do send an ACK.

See above.

It's possible for a router to use QoS to rate limit any type of traffic. The router quite literally receives the traffic on one interface and sends it out another interface. The rate that the traffic is sent is what is rate limited.

TCP, UDP, ICMP, it doesn't matter what type of traffic.

Correct. Eventually the sender will back off in an attempt to reach a congestion-free steady state.

I would bet that a "congestion-free steady state" is /never/ achieved. The very design of most protocols is to send as fast as possible. When they detect errors, they /may/ slow down /for/ /a/ /little/ /while/. But they will speed back up.

Even if a given flow could achieve something resembling a congestion-free steady state, the nature of Internet traffic is so inconsistent that you have flows starting & stopping all the time. Thus you have wildly shifting demands on traffic.

My scenario, as I said, is a SoHo router. I don’t have a lot of servers behind it that receive bursts of incoming traffic asynchronously from outside (other than email, which I host locally).

IMHO servers are actually less of a problem than the average person surfing the web.

Every single web page you go to is at least one new and short lived flow. Many web pages are 100s of new and short lived flows. Most of them start at about the same time.

The more web surfers you have, the more of these types of traffic patterns that you have. It's also very random when they will happen. You could have anywhere between 0 and the number of people on your network at the same time.

Also, multiple windows / tabs mean that more and more of these can happen at the same time.

If my daughter decides to watch an HD movie on an iPad during the day while I’m working, I don’t want that traffic overrunning my network and causing me to not be able to work. In that scenario, the connection is originating internally and going outbound, and it’s long-lived (where "long-lived" is any duration of 20 or more RTT’s).

That's one of the things that QoS is quite good at dealing with.

Though I question how long lived your daughter's streams actually are. I know for a fact that YouTube is a series of small downloads. So each download is relatively short lived. It's not one long lived connection that lasts for the duration of the video.

There also the fact that YouTube prefers QUIC, which is UDP based, to TCP if it can use it.

Only slightly less for me: I did a traffic-shaper plugin for Arno’s Internet Firewall (AIF) about 12 years ago. I’ve since forgotten everything.

Tempus fugit.

Yup.  And I’m hoping to be able to not need ifb to do it.

I forgot about ifb. I think it would do similar to what I was suggesting with network namespaces. Though I do wonder how complicated having multiple things in the same namespace will make tc rules.

Sure, for the total. I meant “guest” bursting over his allotted 10/2 mbps for a short duration, say 600ms (I came up with that as being 5 RTT’s of 120ms). I figure that’s enough for slow-start to ramp up into steady state…

See above comments about steady state.

Well, know you’ve got me confused. Because if each can borrow from the other, where’s the SLA? Where’s the cap? Who gets prioritized?

I think I explained it above.

Each is guaranteed the availability of it's SLA. The unused traffic over the SLA is (can be) fair game.

Meaning that if production is using 15 & 3, there is 25 & 5 that guest could use if allowed to.

Similarly, if guests are sleeping, there is an additional 10 & 2 that production could take advantage of.

I could be completely unshaped, and have both borrowing from each other… which is the degenerate case.

That's why each is guaranteed their SLA *FIRST* and then can use whatever is unused *SECOND*. This allows optimal use of the bandwidth while still guaranteeing SLAs.

Yeah, and indeed that’s what HTB excels at.

Yep.

If memory serves, HTB is one of many that can do it. But HTB was one of the earlier options.

Agreed.

Although… in the case of the “guest” network, I don’t ever want it performing better than the hard SLA of 10/2 mbps, or people will complain when they don’t get extra bandwidth. If they’re conditioned to think that “I’m on the guest network, and 10/2 mbps is all I’m going to get” then they’ll be happy with it and won’t complain.

Okay.

That is a hard policy decision that you are making. I have no objection to that. It also means that guest doesn't get to borrow unused bandwidth from production.

I don’t want to hear, “well, this was so much better two days ago!”

My answer is, “It’s free. You’re getting it by someone else’s good graces… be grateful you’re getting anything at all.”

ACK

Some ISPs were actually squashing the bits, and got spanked severely by the FCC.

Okay. I don't recall that. I wonder why they wanted to stomp on ECN. Especially seeing as how ECN is to alert about congestion. Lack of congestion notification encourages additional ramp up.

I'm assuming that ISPs were clearing ECN. Maybe I have this backwards. Maybe they were artificially setting it to induce slowdowns.

Also, some older router’s IP stacks were not ECN aware, and had the older bit definitions (remember that RFC 3168 and ECN borrowed the ECT1 bit from TOS/LOWCOST from RFC 791 and 1349).

My experience has been that most routers ignore QoS / ECN.

I’m assuming a 3.18 kernel or later and iproute2 + iptables. Nothing else. And sch_htb is present.

Unfortunately there are a LOT of possible combination in that mix.

I also know that the Ubiquity EdgeRouter Lite uses a 2.6 kernel. I don't know about other EdgeOS (Ubiquity Linux distro). But I wouldn't be surprised to learn that EdgeOS is 2.6 period.

This is the same problem that ifb solves, right?

Probably.  (See above.)

I’m not sure I want to assume that Namespaces are available in all scenarios.

Fair enough.

I run old PCs with standard Linux distros as routers. So I can easily add what I want to them.

Yeah, for now I’m not concerned about internal traffic.  Yet.

That's something to keep in mind when creating the QoS configuration. As in you might need to take it into account and make sure that you don't artificially slow it down.

Agreed.

As I said, I don’t want to have to explain to anyone later that “35mbps might have been available Sunday, but today I’m running Carbonite and it’s hogging all the bandwidth while I download these 10 new VM’s I created this morning, so suck it.”

ACK

No, but it can cause other traffic destined to the production network to get dropped, which is the scenario I’m trying to avoid.

I understand.

As I remember, some of the newer (model-based) congestion avoidance algorithms (like BBR) were really much better at fairness and avoiding dropped packets…

My understanding is that BBR is rather aggressive in that it tries to identify what the bandwidth is and then use all of that bandwidth that it can. It's particularly aggressive at finding the bandwidth too.

See above comments about the transient nature of flows.

Thanks.

You're welcome.



--
Grant. . . .
unix || die

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature


[Index of Archives]     [LARTC Home Page]     [Netfilter]     [Netfilter Development]     [Network Development]     [Bugtraq]     [GCC Help]     [Yosemite News]     [Linux Kernel]     [Fedora Users]
  Powered by Linux