Re: [aqm] Last Call: <draft-ietf-aqm-fq-codel-05.txt> (FlowQueue-Codel) to Experimental RFC

Bob Briscoe <research@xxxxxxxxxxxxxx> · Mon, 21 Mar 2016 18:04:30 +0000

    Toke,

    Sorry for not yet sending the follow-up. Straight after that email,
    I got roped into becoming a makeshift ambulance driver and then ...
    long story...

    Thanks for taking my comments constructively, as intended. Responses
    embedded.

    On 18/03/16 12:47, Toke
      Høiland-Jørgensen wrote:

      Hi Bob

Thank you for your timely and constructive comments. Please see the
inline responses below.

        My main concern is with applicability. In particular, the sentence in
section 7 on Deployment Status: "We believe it to be a safe default
and encourage people running Linux to turn it on: ...". and a similar
sentiment repeated in the conclusions. "and we believe it to be safe
to turn on by default, as has already happened in a number of Linux
distributions."

Can one of the authors explain why a solution with the limitations in
section 6 can still be described as "safe"?

      "We believe it to be a safe default" means that we have not seen any of
the theoretical limitations we have documented in section 6 be a concern
*in practice* in any of the extensive number of deployments FQ-CoDel has
seen already. And that the benefits of turning on FQ-CoDel are
sufficient that nudging people in that direction is a good idea.

    This is perhaps because "we" (ie the people looking) tend to have
    significantly more bandwidth than the majority of Internet users
    (those in the developing world). When you have less bandwidth,
    long-running flows last longer, so they tend to overlap more. Given
    bloat problems are only seen intermittently in the first place
    [Hohlfield14], the average person isn't going to see these
    limitations very often. But if you are a homeworker using a VPN (for
    instance), you will be dogged by these problems all the time.

    So the main problem here is with the assumption that the test has to
    be "whether we observe these limitations in practice".

    Few people observed problems with NATs at the time they were
    introduced (otherwise they wouldn't have sold successfully). So
    those arguing against them tended to be ignored by mainstream comms
    engineers. But then the "theoretical" limitations started to bite.
    And we ended up having to make do with a subset of the potential of
    the Internet. Those sounding the warning bells could see the
    potential of the Internet, and they could see how NATs would close
    that off. Those ignoring the warning bells believed they were right
    to only be concerned with the here and now.

    My concern is about precluding future desirable developments in
    application behaviour. It will be rare to observe such cases by
    random inspection, they may not appear while using existing
    applications on existing high speed links. But, they will occur very
    frequently in scenarios prone to them. That's often the nature of
    side-effects.

    My concern is particularly about fq technology in the network
    precluding improvements in the quality of regular best efforts
    service that we can expect through changes in applications and
    transports alone.

    When I was arguing against FQ_CoDel (back in 2013 at the latency
    workshop - you were there too), numerous people were saying that
    FQ_CoDel is much more subtle than regular FQ. At which point I
    quietened down, because I trusted enough of those people. However,
    in the recent tests with HAS (criticised at length elsewhere), one
    thing that can be said with certainty was that FQ_CoDel just becomes
    a regular fq scheduler when you have two or more long-running flows
    that can always keep their queues from emptying. Whatever
    instantaneous rate the application tries to run at, FQ overrides it
    and runs at 1/N of the capacity. That is not good for a video coming
    off a camera at a variable information rate. FQ skims off all the
    peaks, so the VBR codec adapts down to the worst-case peak rate, not
    the worst-case average rate.

        Indeed, these sentences seem rather Orwellian.

      I can assure you that we are not attempting to exert "draconian control
by propaganda, surveillance, misinformation, denial of truth, and
manipulation of the past" (quoting
https://en.wikipedia.org/wiki/Orwellian here). But thank you for
implying it :)

    Well, stating the limitations in the draft, then denying their truth
    in the conclusions by using the word safe to describe them is
    classic Orwellian Newspeak.

        Would it not be correct instead to say that FQ_CoDel has been made the
default in a number of Linux distributions despite not being safe in
some circumstances?

      At the time it was made the default in OpenWrt (several years ago now,
if memory serves me right), there was not a whole lot of real-world
deployment experience, due to the chicken-and-egg problem of not wanting
to change the default before we have gathered more experience. However,
today the situation is quite different, thanks in part to the boldness
of the OpenWrt devs. So no, I do not believe that to be the case any
longer.

    The experience that led me to understand this problem was when a
    bunch of colleagues tried to set up a start-up (a few years ago now)
    to sell a range of "equitable quality" video codecs (ie constant
    quality variable bit-rate instead of constant bit-rate variable
    quality). Then, the first ISP they tried to sell to had WFQ in its
    Broadband remote access servers. Even tho this was between users,
    not flows, when video was the dominant traffic, this overrode the
    benefits of their cool codecs (which would have delivered twice as
    many videos with the same quality over the same capacity.

    Now, by your test, you will never see the limitations these videos
    suffered. Because they never got developed. Because the developers
    gave up. You can think of FQ_CoDel as nice well-meaning people (the
    Linux community) creating a new middlebox problem.

        2. Default?

If a draft saying "We believe it to be a safe default..." is published as an
RFC, it means "The IETF/IESG/etc believes..."
Only one solution can be default, so if the IETF says that FQ_CoDel is a safe
default, and no other AQM RFC makes any claim to being a safe default (which
they do not at the moment), it could be read as the IETF recommending FQ_CoDel
for default status and, by implication, other AQMs (like PIE, say) are not
recommended for default status.

      This is certainly not my reading. This is an experimental RFC saying "we
believe it to be safe as a default" not a standards track RFC saying
"this should be the default". This is an important difference; we are
not mandating anything, but rather expressing our honest opinion on
the applicability of FQ-CoDel as a default, should anyone wish to make
it one in their domain.

        As far as I know, unlike the listed FQ_CoDel limitations, no
limitations of PIE have been identified. I don't think anyone is
claiming that the performance of FQ_CoDel is awesomely better than
PIE. May be a bit better, may be a bit worse, depending on
circumstances, and depending on which you value most out of low
queuing delay, high utilization, or low loss.

      Well, for CoDel and PIE that is certainly true. But FQ-CoDel in many
cases reduces latency under load by an order of magnitude compared to
both of them, while improving throughput.

    OK, I have seen such figures, and it makes sense that FQ will give
    single RTT flows v low latency. 

    My concern is that of course the IESG will want to sign off an RFC
    with this cool performance, given they read that the limitations are
    not important. Whereas I believe the limitations have been
    downplayed.

        So, if the authors want the IETF to recommend a default AQM on the
basis of safety (and I agree safety is the most important factor when
choosing a default), the most likely candidate would be PIE, wouldn't
it? FQ_CoDel has unintended side-effects, which implies it is not a
good candidate for default; it should only be configured deliberately
by those who can live with the side-effects.

      I'm not sure it would be possible for the AQM group to agree on a
recommendation for a default. But I suppose it might be a good
bikeshedding exercise. And as noted above, this is not what we intend to
do in this case.

    If we don't want the IETF (or the AQM WG) to make this call, we
    should make it clear that we are not making this call.

    My concern is that, years down the line, when the context has been
    lost, these sentences could be interpreted as making this call.

    For comparison, consider how we have been trying to understand what
    RFC2309 (the RED manifesto) intended to say.

        3. A Detail

I also have a concern about the way the limitations are written
(typically, each limitation is stated, followed by a arm-waving
qualification attempting to create an impression that there is not
really a limitation). To keep the thread clean, I'll send that in a
follow-up email.

      It is certainly not our intention to "create an impression that there is
not really a limitation". Rather, we are trying to suggest ways in which
each limitation can be mitigated by people who are concerned about it,
but still want to realise the benefits of deploying FQ-CoDel. Sure, some
of those proposals are not exactly at the "running code" stage, but
dismissing them as arm-waving is hardly fair.

I'll add, as I noted initially, that many of the limitations we have
noted are of a theoretical nature (in the sense that we are not aware of
any deployments where they have caused issue in practice). This does not
make it any less important to document them, of course, and we have been
grateful for the feedback from the working group that the section grew
out of (you yourself were among the people providing this feedback, I
believe). However, this also means that it is difficult to do more than
point out each issue. We can't quantify them, for instance.

If you have concrete suggestions for language that would make things
clearer, do tell (though I suppose that's exactly what you'll do in your
follow-up mail). :)

    See the next email (like I promised before).

    Cheers

    Bob

    [Hohlfeld14] Hohlfeld, O., Pujol, E., Ciucu, F., Feldmann, A. &
    Barford, P., "A QoE Perspective on Sizing Network Buffers," In:
    Proc. Internet Measurement Conf (IMC'14) pp.333-346 ACM (November
    2014)

-Toke

    -- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/