Re: Naive question on multiple TCP/IP channels and please dont start a uS NN debate here unless you really want to.

Phillip Hallam-Baker <phill@xxxxxxxxxxxxxxx> · Fri, 6 Feb 2015 10:31:04 -0500

On Fri, Feb 6, 2015 at 8:47 AM, Jim Gettys <jg@xxxxxxxxxxxxxxx> wrote:

On Thu, Feb 5, 2015 at 6:34 PM, Richard Shockey <richard@xxxxxxxxxx> wrote:

OK yes, My Netflix download is going to kill your VOIP call.

Yes, it may, though pacing traffic may help (see the sched_fq work in Linux).

RS > Well ..its always been more subtle than you think. You have to distinguish between a Voice OVER IP call aka Skype etc vs a “managed service” like Voice USING IP or VuIP which is an entirely different beast. Modern VuIP which is all of Cable Voice in Europe and the US  VZ FIOS etc and VoLTE is managed and  uses IP technologies such as SIP/IMS but the routing may or may not have anything to  to do with BGP.  And BTW you have to still do the first order number translation as well.  AKA RFC 6116 ENUM or something new which we are actually debating in RAI.

Its segmented managed traffic. Its not Netfilx killing Skype its Microsoft Apple Android Updates as well.   We have no visibility on how the OS actually queues application packets if at all. 

Yup; this is a problem in the upstream direction (so is not the case you state above, but the inverse case, typified by events such as some one emailing a pile of images to your friends/family, backup and similar operations.  

One of the most common bottlenecks is the WiFi or cellular hop, and if the operating system does a single bloated FIFO drop tail queue discipline, you get into trouble.
  On Linux, this queue discipline has historically been one called PFIFO_FAST, and implemented a large (typically 1024 packet) FIFO drop tail queue (with a little bit of diffserv thrown in for good measure).

Turning on a different queue discipline is a single configuration line, and it appears that people have been deciding to make fq_codel the default in various Linux distributions as of last fall (it has been the default in OpenWrt on routers for a while).

At some point, it may make sense to use a different queue discipline (sched_fq) as the default, but I think more testing is needed.  That's a bit of a discussion better left to a different message.

OK we have a technical fix. But the problem is how to get it deployed by the broadband providers.

There is a very good paper on computer security that might be relevant here. 'Folk models of home computer security'

http://www.rickwash.com/papers/rwash-homesec-soups10-final.pdf

One of the thing we have found at Comodo is that virtually every home computer issue is almost automatically attributed to being 'a virus'. What the customer actually wants is for their damn machine to work. Or alternatively, they want their relatives to stop calling them to ask them to fix their computers.

The point is that people tend to leap to external attack as being the most likely cause of any computer failure issue that isn't obviously dead hardware. Which is really weird when I spent most of the 90s being told I was an evil scaremonger for suggesting those cuddly-wuddly hackers might actually be rather nasty thieving types.

So people see their Netflix or their Vonage suddenly sputter and the first explanation that comes into their heads is 'my ISP is trying to kill them to sell me their stuff instead'.

We have the potential here for a really bitter dispute. Instead of picking up the phone to complain to their ISP, people have been picking up the phone to complain to their Senator or the guy in the White House.

And it is going to be really difficult to explain to a lot of people who have taken up arms on either side of this dispute that this might be the cause of the slowdowns. One side is going to accuse us of being shills for the corporate interests. And on the other side there are a lot of lobbyists licking their chops at the thought of fat billable hours for as long as they can make the fight worse.

So how do we de-escalate the situation?

One part is that we need more than a technical fix for this problem. We need to be able to tell Joe or Jane Consumer how often these slowdowns occur and what the cause is. The problem being that the cause of the problem might be on the broadband provider side or the home user side of the network. 

So maybe have the residential gateway collect some data and expose it to the consumer in some fashion. This could also help debug other connection issues. I was having unexplained network slowdowns for a month that were eventually found to have been caused by a falling tree snapping the fiber but not the sheath round it. So the result was a flaky connection that had the peculiar property of working for some frequencies but not others.

Another is to point out that just as the fact that you can't print from your computer is because you have the wrong printer port assignment rather than a virus does not mean viruses do not exist, the fact that buffer bloat explains some network slowdowns does not mean it explains every slowdown. Nor does it mean that malice is never a cause.