Toke, Sorry for not yet sending the follow-up. Straight after that email, I got roped into becoming a makeshift ambulance driver and then ... long story... Thanks for taking my comments constructively, as intended. Responses embedded. On 18/03/16 12:47, Toke
Høiland-Jørgensen wrote:
This is perhaps because "we" (ie the people looking) tend to have significantly more bandwidth than the majority of Internet users (those in the developing world). When you have less bandwidth, long-running flows last longer, so they tend to overlap more. Given bloat problems are only seen intermittently in the first place [Hohlfield14], the average person isn't going to see these limitations very often. But if you are a homeworker using a VPN (for instance), you will be dogged by these problems all the time.Hi Bob Thank you for your timely and constructive comments. Please see the inline responses below.My main concern is with applicability. In particular, the sentence in section 7 on Deployment Status: "We believe it to be a safe default and encourage people running Linux to turn it on: ...". and a similar sentiment repeated in the conclusions. "and we believe it to be safe to turn on by default, as has already happened in a number of Linux distributions." Can one of the authors explain why a solution with the limitations in section 6 can still be described as "safe"?"We believe it to be a safe default" means that we have not seen any of the theoretical limitations we have documented in section 6 be a concern *in practice* in any of the extensive number of deployments FQ-CoDel has seen already. And that the benefits of turning on FQ-CoDel are sufficient that nudging people in that direction is a good idea. So the main problem here is with the assumption that the test has to be "whether we observe these limitations in practice". Few people observed problems with NATs at the time they were introduced (otherwise they wouldn't have sold successfully). So those arguing against them tended to be ignored by mainstream comms engineers. But then the "theoretical" limitations started to bite. And we ended up having to make do with a subset of the potential of the Internet. Those sounding the warning bells could see the potential of the Internet, and they could see how NATs would close that off. Those ignoring the warning bells believed they were right to only be concerned with the here and now. My concern is about precluding future desirable developments in application behaviour. It will be rare to observe such cases by random inspection, they may not appear while using existing applications on existing high speed links. But, they will occur very frequently in scenarios prone to them. That's often the nature of side-effects. My concern is particularly about fq technology in the network precluding improvements in the quality of regular best efforts service that we can expect through changes in applications and transports alone. When I was arguing against FQ_CoDel (back in 2013 at the latency workshop - you were there too), numerous people were saying that FQ_CoDel is much more subtle than regular FQ. At which point I quietened down, because I trusted enough of those people. However, in the recent tests with HAS (criticised at length elsewhere), one thing that can be said with certainty was that FQ_CoDel just becomes a regular fq scheduler when you have two or more long-running flows that can always keep their queues from emptying. Whatever instantaneous rate the application tries to run at, FQ overrides it and runs at 1/N of the capacity. That is not good for a video coming off a camera at a variable information rate. FQ skims off all the peaks, so the VBR codec adapts down to the worst-case peak rate, not the worst-case average rate. Well, stating the limitations in the draft, then denying their truth in the conclusions by using the word safe to describe them is classic Orwellian Newspeak.Indeed, these sentences seem rather Orwellian.I can assure you that we are not attempting to exert "draconian control by propaganda, surveillance, misinformation, denial of truth, and manipulation of the past" (quoting https://en.wikipedia.org/wiki/Orwellian here). But thank you for implying it :) The experience that led me to understand this problem was when a bunch of colleagues tried to set up a start-up (a few years ago now) to sell a range of "equitable quality" video codecs (ie constant quality variable bit-rate instead of constant bit-rate variable quality). Then, the first ISP they tried to sell to had WFQ in its Broadband remote access servers. Even tho this was between users, not flows, when video was the dominant traffic, this overrode the benefits of their cool codecs (which would have delivered twice as many videos with the same quality over the same capacity.Would it not be correct instead to say that FQ_CoDel has been made the default in a number of Linux distributions despite not being safe in some circumstances?At the time it was made the default in OpenWrt (several years ago now, if memory serves me right), there was not a whole lot of real-world deployment experience, due to the chicken-and-egg problem of not wanting to change the default before we have gathered more experience. However, today the situation is quite different, thanks in part to the boldness of the OpenWrt devs. So no, I do not believe that to be the case any longer. Now, by your test, you will never see the limitations these videos suffered. Because they never got developed. Because the developers gave up. You can think of FQ_CoDel as nice well-meaning people (the Linux community) creating a new middlebox problem. OK, I have seen such figures, and it makes sense that FQ will give single RTT flows v low latency.2. Default? If a draft saying "We believe it to be a safe default..." is published as an RFC, it means "The IETF/IESG/etc believes..." Only one solution can be default, so if the IETF says that FQ_CoDel is a safe default, and no other AQM RFC makes any claim to being a safe default (which they do not at the moment), it could be read as the IETF recommending FQ_CoDel for default status and, by implication, other AQMs (like PIE, say) are not recommended for default status.This is certainly not my reading. This is an experimental RFC saying "we believe it to be safe as a default" not a standards track RFC saying "this should be the default". This is an important difference; we are not mandating anything, but rather expressing our honest opinion on the applicability of FQ-CoDel as a default, should anyone wish to make it one in their domain.As far as I know, unlike the listed FQ_CoDel limitations, no limitations of PIE have been identified. I don't think anyone is claiming that the performance of FQ_CoDel is awesomely better than PIE. May be a bit better, may be a bit worse, depending on circumstances, and depending on which you value most out of low queuing delay, high utilization, or low loss.Well, for CoDel and PIE that is certainly true. But FQ-CoDel in many cases reduces latency under load by an order of magnitude compared to both of them, while improving throughput. My concern is that of course the IESG will want to sign off an RFC with this cool performance, given they read that the limitations are not important. Whereas I believe the limitations have been downplayed. If we don't want the IETF (or the AQM WG) to make this call, we should make it clear that we are not making this call.So, if the authors want the IETF to recommend a default AQM on the basis of safety (and I agree safety is the most important factor when choosing a default), the most likely candidate would be PIE, wouldn't it? FQ_CoDel has unintended side-effects, which implies it is not a good candidate for default; it should only be configured deliberately by those who can live with the side-effects.I'm not sure it would be possible for the AQM group to agree on a recommendation for a default. But I suppose it might be a good bikeshedding exercise. And as noted above, this is not what we intend to do in this case. My concern is that, years down the line, when the context has been lost, these sentences could be interpreted as making this call. For comparison, consider how we have been trying to understand what RFC2309 (the RED manifesto) intended to say. See the next email (like I promised before).3. A Detail I also have a concern about the way the limitations are written (typically, each limitation is stated, followed by a arm-waving qualification attempting to create an impression that there is not really a limitation). To keep the thread clean, I'll send that in a follow-up email.It is certainly not our intention to "create an impression that there is not really a limitation". Rather, we are trying to suggest ways in which each limitation can be mitigated by people who are concerned about it, but still want to realise the benefits of deploying FQ-CoDel. Sure, some of those proposals are not exactly at the "running code" stage, but dismissing them as arm-waving is hardly fair. I'll add, as I noted initially, that many of the limitations we have noted are of a theoretical nature (in the sense that we are not aware of any deployments where they have caused issue in practice). This does not make it any less important to document them, of course, and we have been grateful for the feedback from the working group that the section grew out of (you yourself were among the people providing this feedback, I believe). However, this also means that it is difficult to do more than point out each issue. We can't quantify them, for instance. If you have concrete suggestions for language that would make things clearer, do tell (though I suppose that's exactly what you'll do in your follow-up mail). :) Cheers Bob [Hohlfeld14] Hohlfeld, O., Pujol, E., Ciucu, F., Feldmann, A. & Barford, P., "A QoE Perspective on Sizing Network Buffers," In: Proc. Internet Measurement Conf (IMC'14) pp.333-346 ACM (November 2014) -Toke -- ________________________________________________________________ Bob Briscoe http://bobbriscoe.net/ |