Re: [Last-Call] Artart last call review of draft-ietf-tsvwg-ecn-l4s-id-27

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Bob --

Thanks for the thoughtful and thorough response to my review.  I have gone through the changes you suggest, and they look good to me. 

On Thu, Aug 4, 2022 at 4:57 PM Bob Briscoe <ietf@xxxxxxxxxxxxxx> wrote:
Bernard,

Thank you for taking the time to produce this extremely thorough review.
Pls see [BB] inline;
You will need an HTML email reader for the diffs in this email.
Alternatively, I've temporarily uploaded a side-by-side diff here:
    https://bobbriscoe.net/tmp/draft-ietf-tsvwg-ecn-l4s-id-28a-DIFF-27.html


On 30/07/2022 00:51, Bernard Aboba via Datatracker wrote:
Reviewer: Bernard Aboba
Review result: On the Right Track

Here are my review comments.  I believe this is quite an important document, so
that making the reasoning as clear as possible is important.  Unfortunately,
the writing and overall organization makes the document hard to follow. If the
authors are open to it, I'd be willing to invest more time to help get it into
shape.

[BB] Thank you. You have already obviously sunk considerable time into it. Often I've found that your proposed alternative text didn't quite mean what we intended. But I've taken this as a sign that we hadn't explained it well and tried to guess what made you stumble.

This draft is in the long tail of many statistics: number of years since first draft, number of revisions, number of pages, etc. etc.
So I hope you will understand that this document has been knocked into all sorts of different shapes already, during a huge amount of WG review and consensus building, which I have tried not to upset, while also trying to understand why you felt it needed further changes.

Overall Comments

Abstract

Since this is an Experimental document, I was expecting the Abstract and
perhaps the Introduction to refer briefly to the considerations covered in
Section 7, (such as potential experiments and open issues).

[BB] Good point - I'm surprised no-one has brought this up before - thanks. I'll add the following:
Abstract:
                   ...to prevent it degrading the low queuing delay and
   low loss of L4S traffic.  This experimental track specification defines the rules that
   L4S transports and network elements need to follow with the intention
   that L4S flows neither harm each other's performance nor that of
   Classic traffic.  It also suggests open questions to be investigated 
   during experimentation.  Examples of new ...

Intro:
There wasn't really a relevant point to mention the Experiments section (§7) until the document roadmap (which you ask for later).
So we added a brief summary of the "L4S Experiments" there (see later for the actual text). The only change to the Intro was the first line:

    This experimental track specification...

Organization and inter-relation between Sections

The document has organizational issues which make it more difficult to read.

I think that Section 1 should provide an overview of the specification, helping
the reader navigate it.

[BB]  Section 3 already  provides the basis of a roadmap to both this and other documents. It points to §4 (Transports) & §5 (Network nodes).
It ought to have also referred to §6 (Tunnels and Encapsulations), which was added to the draft fairly recently (but without updating this roadmap). We can and should add that.

We could even move §3 to be the last subsection of §1 (i.e. §1.4). Then it could start the roadmap with §2, which gives the requirements for L4S packet identification.
However, a number of other documents already refer to the Prague L4S Requirements in §4, particularly §4.3. I mean not just I-Ds (which can still be changed), but also papers that have already been published. So a pragmatic compromise would be to just switch round sections 2 (requirements) & 3 (roadmap).

Then we could retitle §3 to "L4S Packet Identification: Document Roadmap"
and add brief mentions of the tail sections (§7 L4S Experiments, and the usual IANA and Security Considerations).
The result is below, with manually added diff colouring (given we'd moved the whole section as well, so it's not a totally precise diff).

2.  L4S Packet Identification: Document Roadmap

   The L4S treatment is an experimental track alternative packet marking
   treatment to the Classic ECN treatment in [RFC3168], which has been
   updated by [RFC8311] to allow experiments such as the one defined in
   the present specification.  [RFC4774] discusses some of the issues
   and evaluation criteria when defining alternative ECN semantics,
   which are further discussed in Section 4.3.1.

   The L4S architecture [I-D.ietf-tsvwg-l4s-arch] describes the three
   main components of L4S: the sending host behaviour, the marking
   behaviour in the network and the L4S ECN protocol that identifies L4S
   packets as they flow between the two.

   The next section of the present document (Section 3) records the
   requirements that informed the choice of L4S identifier.  Then
   subsequent sections specify the L4S ECN protocol, which i) identifies
   packets that have been sent from hosts that are expected to comply
   with a broad type of sending behaviour; and ii) identifies the
   marking treatment that network nodes are expected to apply to L4S
   packets.

   For a packet to receive L4S treatment as it is forwarded, the sender
   sets the ECN field in the IP header to the ECT(1) codepoint.  See
   Section 4 for full transport layer behaviour requirements, including
   feedback and congestion response.

   A network node that implements the L4S service always classifies
   arriving ECT(1) packets for L4S treatment and by default classifies
   CE packets for L4S treatment unless the heuristics described in
   Section 5.3 are employed.  See Section 5 for full network element
   behaviour requirements, including classification, ECN-marking and
   interaction of the L4S identifier with other identifiers and per-hop
   behaviours.

   L4S ECN works with ECN tunnelling and encapsulation behaviour as is,
   except there is one known case where careful attention to
   configuration is required, which is detailed in Section 6.

   L4S ECN is currently on the experimental track.  So Section 7
   collects together the general questions and issues that remain open
   for investigation during L4S experimentation.  Open issues or
   questions specific to particular components are called out in the
   specifications of each component part, such as the DualQ
   [I-D.ietf-tsvwg-aqm-dualq-coupled].

   The IANA assignment of the L4S identifier is specified in
   Section 8.  And Section 9 covers security considerations specific to
   the L4S identifier.  System security aspects, such as policing and
   privacy, are covered in the L4S architecture
   [I-D.ietf-tsvwg-l4s-arch].


Section 1.1 refers to definitions in Section 1.2 so I'd suggest that that
Section 1.2 might be come first.

[BB] The reason for the Problem Statement being the first subsection was because that's what motivates people to read on.

Your suggestion has been made by others in the past, and the solution was to informally explain new terms in the sections before the formal terminology section, as they arose.
The formal terminology section can be considered as the end of the Introductory material and the start of the formal body of the spec.

If there are phrases that are not clearly explained before the terminology section, pls do point them out.
We can reconsider moving the terminology section to 1.1 if there are a lot.
But we'd rather the reader could continue straight into the summary of the problem and that it is understandable stand-alone - without relying on formal definitions elsewhere.

Section 1.3 provides basic information on Scope and the relationship of this
document to other documents.  I was therefore expecting Section 7 to include
questions on some of the related documents (e.g. how L4S might be tested along
with RTP).

[BB] That isn't the role of this document, which would be too abstract (or too long) if it had to cover how to test each different type of congestion control and each type of AQM.
Quoting from §7:
   The specification of each scalable congestion control will need to
   include protocol-specific requirements for configuration and
   monitoring performance during experiments.  Appendix A of the
   guidelines in [RFC5706] provides a helpful checklist.

Over the last 3 months, everyone involved in interop testing has been defining all the test plans, which had their first test-drive last week indeed, the success of the planning and organization of the tests surprised us all - kudos to Greg White who was largely responsible for coordinating it.
We may end up writing that all up as a separate draft. If many tests were documented centrally like this, each CC or AQM might only need to identify any special-case tests specific to itself.
That might even cover testing with live traffic over the Internet as well. But let's walk before we run.


I wonder whether much of Section 2 could be combined with Appendix B, with the
remainder moved into the Introduction, which might also refer to Appendix B.

[BB] What is the problem that you are trying to solve by breaking up this section?

If we split up this section, someone else will want parts moved back, or something else moved. Unless there's a major problem with this section, we'd rather it stayed in one piece. Its main purpose is to record the requirements and to say (paraphrasing), "The outcome is a compromise between requirements 'cos header space is limited. Other solutions were considered, but this one was the least worst."

Summary: no action here yet, pending motivating reasoning from your side.

Section 4.2

   RTP over UDP:  A prerequisite for scalable congestion control is for
      both (all) ends of one media-level hop to signal ECN
      support [RFC6679] and use the new generic RTCP feedback format of
      [RFC8888].  The presence of ECT(1) implies that both (all) ends of
      that media-level hop support ECN.  However, the converse does not
      apply.  So each end of a media-level hop can independently choose
      not to use a scalable congestion control, even if both ends
      support ECN.

[BA] The document earlier refers to an L4S modified version of SCreAM, but does
not provide a reference.  Since RFC 8888 is not deployed today, this paragraph
(and Section 7) leaves me somewhat unclear on the plan to evaluate L4S impact
on RTP. Or is the focus on experimentation with RTP over QUIC (e.g.
draft-ietf-avtcore-rtp-over-quic)?

[BB] Ingemar has given this reply:
[IJ] RFC8298 (SCReAM) in its current version does not describe support for L4S. The open source running code on github does however support L4S. An update of RFC8298 has lagged behind but I hope to start with an RFC8298-bis after the vacation.
RFC8888 is implemented in the public available code for SCReAM (https://github.com/EricssonResearch/scream). This code has been extensively used in demos of 5G Radio Access Networks with L4S capability. The example demos have been cloud gaming and video streaming for remote controlled cars.
The code includes gstreamer plugins as well as multi-camera code tailored for NVidia Jetson Nano/Xavier NX (that can be easily modified for other platforms).

[BB] As an interim reference, Ingemar's README is already cited as [SCReAM-L4S]. it is a brief but decent document about the L4S variant of SCReAM, which also gives further references (and the open source code is its own spec).

Summary: The RFC 8888 part of this question seems to be about plans for how the software for another RFC is expected to be installed or bundled.
Is this a question that you want this draft to answer?

   For instance, for DCTCP [RFC8257], TCP Prague
   [I-D.briscoe-iccrg-prague-congestion-control], [PragueLinux] and the
   L4S variant of SCReAM [RFC8298], the average recovery time is always
   half a round trip (or half a reference round trip), whatever the flow
   rate.

[BA] I'm not sure that an L4S variant of SCReAM could really be considered
"scalable" where simulcast or scalable video coding was being sent. In these
scenarios, adding a layer causes a multiplicative increase in bandwidth, so
that "probing" (e.g. stuffing the channel with RTX probes or FEC) is often a
necessary precursor to make it possible to determine whether adding layers is
actually feasible.

[BB] Ingemar has given this reply:
[IJ] The experiments run so far with SCReAM have been with the NVENC encoder, which supports rate changes on a frame by frame basis, and Jetson Nano/Xavier NX/Xavier AGX that is a bit more slow in its rate control loop. So the actual probing is done by adjusting the target bitrate of the video encoder.

[BB] Since last week (in the first L4S interop), we now have 2 other implementations of real-time video with L4S support directly over UDP (from NVIDIA and Nokia); in addition to the original 2015 demo (also from Nokia). You'd have to ask Ermin Sakic <esakic@xxxxxxxxxx> about the NVIDIA coding, and similarly Koen De Schepper <koen.de_schepper@xxxxxxxxx> about the Nokia ones. I do know that both Nokia ones change rate packet-by-packet (and if channel conditions are poor, the new one can even reduce down to 500kb/s while still preserving the same low latency).

The message here is that, for low latency video, you can't just use any old encoding that was designed without latency in mind.
Again, is this a question that you want this draft to answer? It seems like something that would be discussed in the spec of each r-t CC technique.

   As with all transport behaviours, a detailed specification (probably
   an experimental RFC) is expected for each congestion control,
   following the guidelines for specifying new congestion control
   algorithms in [RFC5033].  In addition it is expected to document
   these L4S-specific matters, specifically the timescale over which the
   proportionality is averaged, and control of burstiness.  The recovery
   time requirement above is worded as a 'SHOULD' rather than a 'MUST'
   to allow reasonable flexibility for such implementations.

[BA] Is the L4S variant of SCReaM one of the detailed specifications that is
going to be needed? From the text I wasn't sure whether this was documented
work-in-progress or a future work item.

[BB] We cannot force implementers to write open specifications of their algorithms. Implementers might have secrecy constraints, or just not choose to invest the time in spec writing. So there is no hit-list of specs that 'MUST' be written, except we consider it proper to document the reference implementation of the Prague CC.
Nonetheless, others also consider it proper to document their algorithm (e.g. BBRv2), and in the case of SCReAM, Ingemar has promised he will (as quoted above).

We don't (yet?) have a description of the latest two implementations that the draft can refer to (they only announced these on the first day of the interop last week).
We try to keep a living web page up to date that points to current implementations ( https://l4s.net/#code ). However, I don't think the RFC Editor would accept this as an archival reference.

Section 4.3.1

      To summarize, the coexistence problem is confined to cases of
      imperfect flow isolation in an FQ, or in potential cases where a
      Classic ECN AQM has been deployed in a shared queue (see the L4S
      operational guidance [I-D.ietf-tsvwg-l4sops] for further details
      including recent surveys attempting to quantify prevalence).
      Further, if one of these cases does occur, the coexistence problem
      does not arise unless sources of Classic and L4S flows are
      simultaneously sharing the same bottleneck queue (e.g. different
      applications in the same household) and flows of each type have to
      be large enough to coincide for long enough for any throughput
      imbalance to have developed.

[BA] This seems to me to be one of the key questions that could limit the
"incremental deployment benefit".  A reference to the discussion in Section 7
might be appropriate here.

[BB] OK. At the end of the above para I've added:

                                    Therefore, how often the coexistence
       problem arises in practice is listed in Section 7 as an open
       question that L4S experiments will need to answer.


5.4.1.1.1.  'Safe' Unresponsive Traffic

   The above section requires unresponsive traffic to be 'safe' to mix
   with L4S traffic.  Ideally this means that the sender never sends any
   sequence of packets at a rate that exceeds the available capacity of
   the bottleneck link.  However, typically an unresponsive transport
   does not even know the bottleneck capacity of the path, let alone its
   available capacity.  Nonetheless, an application can be considered
   safe enough if it paces packets out (not necessarily completely
   regularly) such that its maximum instantaneous rate from packet to
   packet stays well below a typical broadband access rate.

[BA] The problem with video traffic is that the encoder typically
targets an "average bitrate" resulting in a keyframe with a
bitrate that is above the bottleneck bandwidth and delta frames
that are below it.  Since the "average rate" may not be
resettable before sending another keyframe, video has limited
ability to respond to congestion other than perhaps by dropping
simulcast and SVC layers. Does this mean that a video is
"Unsafe Unresponsive Traffic"?

[BB] This section on 'Safe' Unresponsive traffic is about traffic that is so low rate that it doesn't need to use ECN to respond to congestion at all (e.g. DNS, NTP). Video definitely does not fall into that category.

I think your question is really asking whether video even /with/ ECN support can be considered responsive enough to maintain low latency. For this you ought to try to see the demonstration that Nokia did last week (if a recording is put online) or the Ericsson demonstration which is already online [EDT-5GLL]. Both over emulated 5G radio access networks with variability of channel conditions, and both showed very fast interaction within the video with no perceivable lag to the human eye. With the Nokia one last week, using finger gestures sent over the radio network, you could control the viewport into a video from a 360⁰ camera, which was calculated and generated at the remote end. No matter how fast you shook your finger around, the viewport stayed locked onto it.

Regarding keyframes, for low latency video, these are generally spread across the packets carrying the other frames.

[EDT-5GLL] Ericsson and DT demo 5G low latency feature: https://www.ericsson.com/en/news/2021/10/dt-and-ericsson-successfully-test-new-5g-low-latency-feature-for-time-critical-applications

I detect here that this also isn't a question about the draft - more a question of "I need to see it to believe it"?

NITs

Abstract

   The L4S identifier defined in this document distinguishes L4S from
   'Classic' (e.g. TCP-Reno-friendly) traffic.  It gives an incremental
   migration path so that suitably modified network bottlenecks can
   distinguish and isolate existing traffic that still follows the
   Classic behaviour, to prevent it degrading the low queuing delay and
   low loss of L4S traffic.  This specification defines the rules that

[BA] Might be clear to say "This allows suitably modified network..."

[BB] I'm not sure what the problem is. But I'm assuming you're saying you tripped over the word 'gives'. How about simplifying:

             It gives an incremental
   migration path so that suitably modified  Then, network bottlenecks can be incrementally modified to
   distinguish and isolate existing traffic that still follows the
   Classic behaviour, to prevent it degrading the low queuing delay and
   low loss of L4S traffic.  

The words "incremental migration path" suggest that there deployment of
L4S-capable network devices and endpoints provides incremental benefit.
That is, once new network devices are put in place (e.g. by replacing
a last-mile router), devices that are upgraded to support L4S will
see benefits, even if other legacy devices are not ugpraded.

If this is the point you are looking to make, you might want to clarify
the language.

[BB] I hope the above diff helps. Is that enough for an abstract, which has to be kept very brief?
Especially as all the discussion about incremental deployment is in the L4S architecture doc, so it wouldn't be appropriate to make deployment a big thing in the abstract of this draft.
Nonetheless, we can flesh out the text where incremental deployment is already mentioned in the intro (see our suggested text for your later point about this, below).

Summary: We propose only the above diff on these points about "incremental migration" in the abstract.

   L4S transports and network elements need to follow with the intention
   that L4S flows neither harm each other's performance nor that of
   Classic traffic.  Examples of new active queue management (AQM)
   marking algorithms and examples of new transports (whether TCP-like
   or real-time) are specified separately.

[BA] Don't understand "need to follow with the intention". Is this
stating a design principle, or is does it represent deployment
guidance?

[BB] I think a missing comma is the culprit. Sorry for confusion. It should be:
   This specification defines the rules that
   L4S transports and network elements need to follow, with the intention
   that L4S flows neither harm each other's performance nor that of
   Classic traffic.

The sentence "L4S flows neither harm each other's performance nor that
of classic traffic" might be better placed after the first sentence
in the second paragraph, since it relates in part to the "incremental
deployment benefit" argument.

[BB] That wouldn't be appropriate, because:
* To prevent "Classic harms L4S" an L4S AQM needs the L4S identifier on packets to isolate them
* To prevent "L4S harms Classic" needs the L4S sender to detect that it's causing harm which is sender behaviour (rules), not identifier-based.
So the sentence has to come after the point about "the spec defines the rules".

Summary: we propose no action on this point.

Section 1. Introduction

   This specification defines the protocol to be used for a new network
   service called low latency, low loss and scalable throughput (L4S).
   L4S uses an Explicit Congestion Notification (ECN) scheme at the IP
   layer with the same set of codepoint transitions as the original (or
   'Classic') Explicit Congestion Notification (ECN [RFC3168]).
   RFC 3168 required an ECN mark to be equivalent to a drop, both when
   applied in the network and when responded to by a transport.  Unlike
   Classic ECN marking, the network applies L4S marking more immediately
   and more aggressively than drop, and the transport response to each

   [BA] Not sure what "aggressively" means here. In general, marking
   traffic seems like a less aggressive action than dropping it. Do
   you mean "more frequently"?

[BB] OK; 'frequently' it is.

(FWIW, I recall that the transport response used to be described as more aggressive (because it reduces less in response to each mark), and the idea was that using aggressive for both would segue nicely into the next sentence about the two counterbalancing. Someone asked for that to be changed, and now the last vestiges of that failed literary device are cast onto the cutting room floor. The moral of this tale: never try to write a literary masterpiece by committee ;)

   Also, it's a bit of a run-on sentence, so I'd break it up:

   "than drop.  The transport response to each"

   mark is reduced and smoothed relative to that for drop.  The two
   changes counterbalance each other so that the throughput of an L4S
   flow will be roughly the same as a comparable non-L4S flow under the
   same conditions.  

[BB] Not sure about this - by the next sentence (about the two changes), the reader has lost track of them. How about using numbering to structure the long sentence:
   Unlike
   Classic ECN marking: i) the network applies L4S marking more immediately
   and more aggressively than drop; and ii) the transport response to each
   mark is reduced and smoothed relative to that for drop. The two 
   changes counterbalance each other...
OK?

Nonetheless, the much more frequent ECN control
   signals and the finer responses to these signals result in very low
   queuing delay without compromising link utilization, and this low
   delay can be maintained during high load.  For instance, queuing
   delay under heavy and highly varying load with the example DCTCP/
   DualQ solution cited below on a DSL or Ethernet link is sub-
   millisecond on average and roughly 1 to 2 milliseconds at the 99th
   percentile without losing link utilization [DualPI2Linux], [DCttH19].

   [BA] I'd delete "cited below" since you provide the citation at
   the end of the sentence.

[BB] 'Cited below' referred to the DCTCP and DualQ citations in the subsequent para, because this is the first time either term has been mentioned.
    'Described below'
was what was really meant. I think that makes it clear enough (?).

   Note that the inherent queuing delay while waiting to acquire a
   discontinuous medium such as WiFi has to be minimized in its own
   right, so it would be additional to the above (see section 6.3 of the
   L4S architecture [I-D.ietf-tsvwg-l4s-arch]).

   [BA] Not sure what "discontinuous medium" means. Do you mean
   wireless?  Also "WiFi" is a colloquialism; the actual standard
   is IEEE 802.11 (WiFi Alliance is an industry organization).
   Might reword this as follows:

   "Note that the changes proposed here do not lessen delays from
    accessing the medium (such as is experienced in [IEEE-802.11]).
    For discussion, see Section 6.3 of the L4S architecture
    [I-D.ietf-tsvwg-l4s-arch]."

[BB] We've used 'shared' instead. Other examples of shared media are LTE, 5G, DOCSIS (cable), DVB (satellite), PON (passive optical network). So I've just said 'wireless' rather than give a gratuitous citation of 802.11.

   Note that the inherent queuing delay while waiting to acquire a
   discontinuous
   shared medium such as WiFi wireless has to be minimized in its own
   right, so it would be additional added to the above above.  It is
   a different issue that needs to be addressed, but separately (see
   section 6.3 of the L4S architecture [I-D.ietf-tsvwg-l4s-arch]).

Then, because wireless is less specific, I've taken out 'inherent' because strictly medium acquisition delay is not inherent to a medium - it depends on the multiplexing scheme. For instance radio networks can use CDM (code division multiplexing), and they did in 3G.
'Inherent' was trying to get over the sense that this delay is not amenable to reduction by congestion control. Rather than try to cram all those concepts into one sentence, I've split it.

OK?

   L4S is not only for elastic (TCP-like) traffic - there are scalable
   congestion controls for real-time media, such as the L4S variant of
   the SCReAM [RFC8298] real-time media congestion avoidance technique
   (RMCAT).  The factor that distinguishes L4S from Classic traffic is

   [BA] Is there a document that defines the L4S variant of SCReAM?

[BB] I've retagged Ingemar's readme as [SCReAM-L4S], and included it here to match the other two occurrences of SCReAM:

                                           such as the L4S variant
   [SCReAM-L4S] of the SCReAM [RFC8298] real-time media congestion
   avoidance technique (RMCAT).

It sounds like Ingemar plans to update RFC8298 with a bis, so I guess eventually [RFC8298] should automatically become a reference to its own update.

   its behaviour in response to congestion.  The transport wire
   protocol, e.g. TCP, QUIC, SCTP, DCCP, RTP/RTCP, is orthogonal (and
   therefore not suitable for distinguishing L4S from Classic packets).

   The L4S identifier defined in this document is the key piece that
   distinguishes L4S from 'Classic' (e.g. Reno-friendly) traffic.  It
   gives an incremental migration path so that suitably modified network
   bottlenecks can distinguish and isolate existing Classic traffic from
   L4S traffic to prevent the former from degrading the very low delay
   and loss of the new scalable transports, without harming Classic
   performance at these bottlenecks.  Initial implementation of the
   separate parts of the system has been motivated by the performance
   benefits.

[BA] I think you are making an "incremental benefit" argument here,
but it might be made more explicit:

"  The L4S identifier defined in this document distinguishes L4S from
   'Classic' (e.g. Reno-friendly) traffic. This allows suitably
   modified network bottlenecks to distinguish and isolate existing
   Classic traffic from L4S traffic, preventing the former from
   degrading the very low delay and loss of the new scalable
   transports, without harming Classic performance. As a result,
   deployment of L4S in network bottlenecks provides incremental
   benefits to endpoints whose transports support L4S."

[BB] We don't really want to lose the point about the identifier being key. So I've kept that. And for the middle sentence, I've used the simpler construction developed above (for the similar wording in the abstract).

Regarding the last sentence, no, it meant more than that. It meant that, even though implementer's customers get no benefit until both parts are deployed, for some implementers the 'size of the potential prize' has already been great enough to warrant investment in implementing their part, without any guarantee that other parts will be implemented. However, we need to be careful not to stray into conjecture and predictions, particularly not commercial ones, which is why this sentence was written in the past tense. Pulling this all together, how about:

   The L4S identifier defined in this document is the key piece that
   distinguishes L4S from 'Classic' (e.g. Reno-friendly) traffic.  It
   gives an incremental migration path so that suitably modified  Then,
   network bottlenecks can be incrementally modified to distinguish and
   isolate existing Classic traffic from L4S traffic, to prevent the
   former from degrading the very low queuing delay and loss of the new
   scalable transports, without harming Classic performance at these
   bottlenecks.  Although both sender and network deployment are
   required before any benefit, initial implementations of the separate
   parts of the system have been motivated by the potential performance
   benefits.
I considered adding "have already been motivated..." or "at the time of writing, initial implementations..." but decided against both - they sounded a bit hyped up.
What do you think?


Section 1.1 1.1.  Latency, Loss and Scaling Problems

   Latency is becoming the critical performance factor for many (most?)
   applications on the public Internet, e.g. interactive Web, Web
   services, voice, conversational video, interactive video, interactive
   remote presence, instant messaging, online gaming, remote desktop,
   cloud-based applications, and video-assisted remote control of
   machinery and industrial processes.  In the 'developed' world,
   further increases in access network bit-rate offer diminishing
   returns, whereas latency is still a multi-faceted problem.  In the
   last decade or so, much has been done to reduce propagation time by
   placing caches or servers closer to users.  However, queuing remains
   a major intermittent component of latency.

[BA] Since this paragraph provides context for the work, you might
consider placing it earlier (in Section 1 as well as potentially in
the Abstract).

[BB] The L4S architecture Intro already starts like you suggest.
    See https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-l4s-arch-19#section-1

The present doc starts out more as a technical spec might, with a 4-para intro focusing on what it says technically. Then it has a fairly long subsection to summarize the problem for those reading it stand-alone. That is intentional (so readers who have already read the architecture can easily jump).

Summary: We propose to leave the opening of the intro unchanged.

Might modify this as follows:

"
   Latency is the critical performance factor for many Internet
   applications, including web services, voice, realtime video,
   remote presence, instant messaging, online gaming, remote
   desktop, cloud services, and remote control of machinery and
   industrial processes. In these applications, increases in access
   network bitrate may offer diminishing returns. As a result,
   much has been done to reduce delays by placing caches or
   servers closer to users. However, queuing remains a major
   contributor to latency."
We've picked up most, but not all, of your suggestions:
   Latency is becoming the critical performance factor for many (most?)
   applications on the public Internet,
   Internet applications, e.g. interactive Web, Web web, web services, voice,
   conversational video, interactive video, interactive remote presence,
   instant messaging, online gaming, remote desktop, cloud-based applications,
   applications & services, and video-assisted remote control of machinery and
   industrial processes.  In many parts of the 'developed' world, further increases
   in access network bit-rate bit rate offer diminishing
   returns [Dukkipati06], whereas latency is still a multi-faceted problem.  In the
   last decade or so,  As a result, much
   has been done to reduce propagation time by placing caches or servers
   closer to users.  However, queuing remains a major intermittent major, albeit
   intermittent, component of latency.

We've added [Dukkipati06], because we were asked to justify the similar 'diminishing returns' claim in the L4S architecture, and Dukkipati06 provides a plot supporting that in its intro:
   [Dukkipati06]
              Dukkipati, N. and N. McKeown, "Why Flow-Completion Time is
              the Right Metric for Congestion Control", ACM CCR
              36(1):59--62, January 2006,
              <https://dl.acm.org/doi/10.1145/1111322.1111336>.

The distinctions between different applications of the same technology were deliberately intended to distinguish different degrees of latency sensitivity, so we left some of them in.
OK?

   The Diffserv architecture provides Expedited Forwarding [RFC3246], so
   that low latency traffic can jump the queue of other traffic.  If
   growth in high-throughput latency-sensitive applications continues,
   periods with solely latency-sensitive traffic will become
   increasingly common on links where traffic aggregation is low.  For
   instance, on the access links dedicated to individual sites (homes,
   small enterprises or mobile devices).  These links also tend to
   become the path bottleneck under load.  During these periods, if all
   the traffic were marked for the same treatment, at these bottlenecks
   Diffserv would make no difference.  Instead, it becomes imperative to
   remove the underlying causes of any unnecessary delay.

[BA] This paragraph is hard to follow. You might consider rewriting it as
follows:

   "The Diffserv architecture provides Expedited Forwarding [RFC3246], to
   enable low latency traffic to jump the queue of other traffic. However,
   the latency-sensitive applications are growing in number along
   with the fraction of latency-sensitive traffic. On bottleneck links where
   traffic aggregation is low (such as links to homes, small enterprises or
   mobile devices), if all traffic is marked for the same treatment, Diffserv
   will not make a difference. Instead, it is necessary to remove unnecessary
   delay."

[BB] Your proposed replacement has the following problems:
* It relies on prediction (the previous text avoided prediction, instead saying "if growth ... continues");
* The proposed replacement loses the critical sense of "periods with solely latency sensitive traffic" (not all the time)
* it also loses the critical idea that the same links that are low stat mux tend to also be those where the bottleneck is.
How about:

   The Diffserv architecture provides Expedited Forwarding [RFC3246], so
   that low latency traffic can jump the queue of other traffic.  If
   growth in high-throughput latency-sensitive applications continues, periods with
   solely latency-sensitive traffic will become increasingly common on
   links where traffic aggregation is low.  For
   instance, on the access links dedicated to individual sites (homes,
   small enterprises or mobile devices).  These links also tend to
   become the path bottleneck under load.  During these periods, if all  During these periods, if all
   the traffic were marked for the same treatment, at these bottlenecks Diffserv would make
   no difference.  Instead,  The links with low aggregation also tend to become
   the path bottleneck under load, for instance, the access links
   dedicated to individual sites (homes, small enterprises or mobile
   devices).  So, instead of differentiation, it becomes imperative to
   remove the underlying causes of any unnecessary delay.

I tried to guess what you found hard to follow, but still to keep all the concepts. The main changes were:
*  to switch the sentence order so "periods with solely" and "these periods" were not a few sentences apart.
* to make it clear what 'instead' meant.
Better?


  long enough for the queue to fill the buffer, making every packet in
   other flows sharing the buffer sit through the queue.

   [BA] "sit through" -> "share"

[BB] Nah, that's tautology "other flows sharing the buffer share the queue".
And it loses the sense of waiting. If "sit through" isn't understandable, how about

   "...causing every packet in other flows sharing the buffer to have to
   work its way through the queue.
"
?

   Active queue management (AQM) was originally developed to solve this
   problem (and others).  Unlike Diffserv, which gives low latency to
   some traffic at the expense of others, AQM controls latency for _all_
   traffic in a class.  In general, AQM methods introduce an increasing
   level of discard from the buffer the longer the queue persists above
   a shallow threshold.  This gives sufficient signals to capacity-
   seeking (aka. greedy) flows to keep the buffer empty for its intended
   purpose: absorbing bursts.  However, RED [RFC2309] and other
   algorithms from the 1990s were sensitive to their configuration and
   hard to set correctly.  So, this form of AQM was not widely deployed.

   More recent state-of-the-art AQM methods, e.g. FQ-CoDel [RFC8290],
   PIE [RFC8033], Adaptive RED [ARED01], are easier to configure,
   because they define the queuing threshold in time not bytes, so it is
   invariant for different link rates.  However, no matter how good the
   AQM, the sawtoothing sending window of a Classic congestion control
   will either cause queuing delay to vary or cause the link to be
   underutilized.  Even with a perfectly tuned AQM, the additional
   queuing delay will be of the same order as the underlying speed-of-
   light delay across the network, thereby roughly doubling the total
   round-trip time.

[BA] Would suggest rewriting as follows:

"  More recent state-of-the-art AQM methods such as FQ-CoDel [RFC8290],
   PIE [RFC8033] and Adaptive RED [ARED01], are easier to configure,
   because they define the queuing threshold in time not bytes, providing
   link rate invariance.  However, AQM does not change the "sawtooth"
   sending behavior of Classic congestion control algorithms, which
   alternates between varying queuing delay and link underutilization.
   Even with a perfectly tuned AQM, the additional queuing delay will
   be of the same order as the underlying speed-of-light delay across
   the network, thereby roughly doubling the total round-trip time."

[BB] We've taken most of these suggestions, but link rate invariance is rather a mouthful.
Also more queue delay or more under-utilization wasn't meant to imply alternating between the two.
So how about:

   More recent state-of-the-art AQM methods, e.g. such as FQ-CoDel [RFC8290],
   PIE [RFC8033] or Adaptive RED [ARED01], are easier to configure,
   because they define the queuing threshold in time not bytes, so it
   configuration is invariant for different whatever the link rates. rate.  However, no matter how good the
   AQM, the
   sawtoothing sending window of a Classic congestion control creates a dilemma
   for the operator: i) either configure a shallow AQM operating point,
   so the tips of the sawteeth cause minimal queue delay but the troughs
   underutilize the link, or ii) configure the operating point deeper
   into the buffer, so the troughs utilize the link better but then the
   tips cause more delay variation.  Even...

OK?

   If a sender's own behaviour is introducing queuing delay variation,
   no AQM in the network can 'un-vary' the delay without significantly
   compromising link utilization.  Even flow-queuing (e.g. [RFC8290]),
   which isolates one flow from another, cannot isolate a flow from the
   delay variations it inflicts on itself.  Therefore those applications
   that need to seek out high bandwidth but also need low latency will
   have to migrate to scalable congestion control.

[BA] I'd suggest you delete the last sentence, since the point is
elaborated on in more detail in the next paragraph.

[BB] Actually, this point is not made in the next para (but you might have thought it was because it's not clear, so below I've tried to fix it).
Indeed, I've realized we need to /add/ to the last sentence, because we haven't yet said what a scalable control is...

       ...migrate to scalable congestion control, which uses much smaller 
   sawtooth variations.

   Altering host behaviour is not enough on its own though.  Even if
   hosts adopt low latency behaviour (scalable congestion controls),
   they need to be isolated from the behaviour of existing Classic
   congestion controls that induce large queue variations.  L4S enables
   that migration by providing latency isolation in the network and

[BA] "enables that migration" -> "motivates incremental deployment"

   distinguishing the two types of packets that need to be isolated: L4S
   and Classic.  L4S isolation can be achieved with a queue per flow
   (e.g. [RFC8290]) but a DualQ [I-D.ietf-tsvwg-aqm-dualq-coupled] is
   sufficient, and actually gives better tail latency.  Both approaches
   are addressed in this document.

[BB] The intended meaning here is 'enables' (technical feasibility), not motivates (human inclination).
But whatever, in the rewording below, I don't think either is needed. I'm also assuming that middle sentence didn't make sense for you, and I think I see why. So how about:

   Altering host behaviour is not enough on its own though.  Even if
   hosts adopt low latency behaviour (scalable congestion controls), they need to be
   isolated from the behaviour of large queue variations induced by existing Classic
   congestion controls that induce large queue variations.  L4S enables that migration
   by providing L4S AQMs provide that latency isolation in the network and
   distinguishing the L4S identifier enables the AQMs to distinguish the
   two types of packets that need to be isolated: L4S and Classic.

How's that?

   The DualQ solution was developed to make very low latency available
   without requiring per-flow queues at every bottleneck.  This was

[BA] "This was" -> "This was needed"

[BB] Not quite that strong. More like:
    "This was useful"
   Latency is not the only concern addressed by L4S: It was known when

   [BA] ":" -> "."

[BB] OK.

   explanation is summarised without the maths in Section 4 of the L4S

   [BA] "summarised without the maths" -> "summarized without the mathematics"

[BB] OK - that nicely side-steps stumbles from either side of the Atlantic.

1.2.  Terminology

[BA] Since Section 1.1 refers to some of the Terminology defined in
this section, I'd consider placing this section before that one.

[BB] See earlier for push-back on this.

   Reno-friendly:  The subset of Classic traffic that is friendly to the
      standard Reno congestion control defined for TCP in [RFC5681].
      The TFRC spec. [RFC5348] indirectly implies that 'friendly' is

      [BA] "spec." -> "specification"

[BB] I checked this after a previous review comment, and 'spec' is now considered to be a word in its own right. I should have removed the full-stop though, which I did for all other occurrences.
However, the RFC Editor might have a style preference on this point, in which case I will acquiesce.


      defined as "generally within a factor of two of the sending rate
      of a TCP flow under the same conditions".  Reno-friendly is used
      here in place of 'TCP-friendly', given the latter has become
      imprecise, because the TCP protocol is now used with so many
      different congestion control behaviours, and Reno is used in non-

      [BA] "Reno is used" -> "Reno can be used"

[BB] OK

4.  Transport Layer Behaviour (the 'Prague Requirements')

[BA] This section is empty and there are no previous references to Prague. So I
think you need to say a few words here to introduce the section.

[BB] OK. How about:

   This section defines L4S behaviour at the transport layer, also known
   as the Prague L4S Requirements (see Appendix A for the origin of the
   name).


Again, thank you very much for all the time and effort you've put into this review.

Regards



Bob



-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/
-- 
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call

[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Mhonarc]     [Fedora Users]

  Powered by Linux