Re: [Last-Call] [Tsv-art] Tsvart last call review of draft-ietf-teas-rfc3272bis-24

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Bob,

Thanks for continuing to discuss these points. I've tried to edit down to the remaining issues. I feel that we may not be reaching convergence on all points, but that we have discussed the points and provided explanations for the changes made or where no change is made.

More in line and on the other thread.

Cheers,
Adrian

-----Original Message-----
From: Bob Briscoe <ietf@xxxxxxxxxxxxxx> 
Sent: 28 July 2023 19:26
To: adrian@xxxxxxxxxxxx; tsv-art@xxxxxxxx
Cc: draft-ietf-teas-rfc3272bis.all@xxxxxxxx; last-call@xxxxxxxx; teas@xxxxxxxx
Subject: Re: [Tsv-art] Tsvart last call review of draft-ietf-teas-rfc3272bis-24

Adrian,

Sorry, I overlooked your later inline responses below before sending my 
follow-up review of this new rev, sent last night (27 Jul).
See [BB] below.

>> ==SERIOUS - PLS FIX==
>>
>> 1. The "Congestion Problem"
>>  From a transport area perspective, there is one particiularly glaring omission.
>> A very large majority of Internet traffic is either capacity-seeking or at
>> least adaptive to available capacity. This traffic intentionally induces
>> congestion at the path bottleneck, which is 'good congestion', because it
>> maximizes capacity utilization and minimizes completion time. So when this
>> draft repeatedly says that the goal of TE is to combat "The Congestion
>> Problem", it needs to explain why one part of the IETF is trying to induce
>> congestion while another (this draft) is trying to combat it.
>>
>> The explanation is that most network operators design their networks with one
>> node per-customer (or per-customer-site) as the path bottleneck (or two nodes,
>> if dual-homed). Then this node (typically the multi-service edge or equivalent)
>> is where operators can focus deployment of traffic management and control
>> functions including service differentiation, while other nodes can be
>> overprovisioned so that they either do not need these functions at all, or they
>> only need much-simplified functions that complement the primary controls at the
>> edge node.
>>
>> Once this context has been explained, the goal of TE is indeed to /avoid/
>> congestion at all these other nodes, while the goal of endpoints is to /induce/
>> congestion at their bottleneck node (but only when they have something to send
>> or receive - the rest of the time they are idle).
>>
>> Each occurrence of 'congestion problem' will then need to be qualified. Eg:
>>
>> * "Clearly, congestion is highly undesirable."
>>
>> * "Congestion is one of the most significant problems in an operational IP
>> context."
>>
>> * "If traffic from a source to a destination exceeds the capacity of a link
>> along the shortest path, the link (and hence the shortest path) becomes
>> congested while a longer path between these two nodes may be under-
>> utilized."
>> [Given latency is important to many / most applications, if throughput is
>> sufficient, it would be wrong to 'solve' this 'problem' by using the longer
>> path. The solution would be to minimize the delay that results from congestion
>> by using the latest queue management techniques.]
> Delivering low-latency may also be an aim of TE. The least cost path might
> not be the lowest latency path, and if you set the metrics to reflect the
> latency then the least cost path might not have enough bandwidth, and so
> on. TE is a solution to a multi-faceted problem. And, indeed, "if throughput
> is sufficient" is exactly the point, isn't it?
>
> But you're right about congestion (although I could take issue with calling
> building a pipeline as congestion).
>
> What I've done, is to bolster the various descriptions in the terminology
> section to describe the good and the bad, and I've introduced "network 
> congestion" as a new term. Then, throughout the document, I have referred
> mainly to "network congestion" since TE is mainly about reducing the impact
> of congestion within the network. The aim is not to write a thesis on
> congestion, but to give a steer (sic), and I am worried that more words risks
> opening a can of worms and lack of precision so I have tried to stay minimal.

[BB] My original point was about the specific para quoted above, which 
[BB] remains unchanged so it doesn't reflect the multifaceted goals of TE.

[AF] This paragraph does remain unchanged. Note, however, that the previous paragraph (before the bullet list) has been expanded to clarify the meaning of "shortest path" and so give context to this bullet. Further, this bullet is clearly referenced as a problem that may be caused, not a problem in all circumstances.

[BB] Regarding the new defined term 'network congestion', the definition is 
[BB] sort-of fine, but two problems:
[BB] 1. it would be better if it said it does /not/ include the normal 
[BB] congestion that capacity-seeking sources induce.

[AF] I fear that the term "capacity seeking" is going to be misleading in the context of a document about routing and steering: it will be assumed to imply that the flow is directed to a path that has capacity. Of course, in the context of a protocol like TCP it simply means finding the capacity of the path and then sending at a rate that will make good use of that capacity without swamping the network.

[AF] I understand that you are keen to make it clear that a certain amount of "good congestion" is fine and ensures that network resources are used at their highest capacity and that the flows on those resources get the best possible throughput. I believe that the combination of definition of "Congestion" and "Network congestion" cover this point, especially "Network congestion: Congestion within the network at a specific node or a specific link that is sufficiently extreme that it results in unacceptable queuing delay or packet loss." IMHO, it doesn't matter whether that congestion was induced by capacity-seeking or other events, if it results in unacceptable delay or loss, then it is bad congestion.

[BB] 2. the term 'network congestion' itself uses a pair of words that are 
[BB] always used to mean both types of congestion, so it doesn't improve matters
[BB] How about 'core-network congestion'?

[AF] I think that this document can safely make its own definitions.

>> 1.1 Definition of Congestion
>> Quoting 3 instances of similar wording:
[snip]
> I've tried to do this in the new text, but I'd note that the "extreme of congestion" is exactly what we are pursuing with traffic engineering.

[BB] See email last night, re 'beating harder at the door'

[AF] See response on that thread.

>> GENERAL COMMENTS
>>
>> 2. Current Practice?
>
>> Content Distribution (§5.2) is the only TE technique included that is not
>> within the IETF project section.
>>
>> I suggest the following should also be included:
>>
>> * ECMP. This is described rather disparagingly in §6.2 under routing
>> recommendations as if it is not good enough. However it is widely used with
>> n-stage Clos topologies (with n=2 or higher), precisely because it is
>> considered good enough (i.e. cost-effective) by many major operators.
>
> I don't think the text about ECMP is disparaging. ECMP may be very good
> and useful in a whole range of networks and at any technology level, but
> it is not traffic engineering. Certainly not when it is applied (as it mainly is)
> to parallel single hops: here it is a very effective way of making "fatter pipes".

[BB] ECMP is not TE, only if TE is defined sthg like the quote I gave 
[BB] from Tom Nadeau (and how you and I understand TE). But ECMP is TE under 
[BB] the much broader definition of TE in this draft, which even includes 
[BB] capacity planning. But I understand that you might not want to add to 
[BB] the grab-bag, even though there are precedents for doing so in the draft 
[BB] that you also don't want to remove at this stage.

All I can think to do here is note that the construction of ECMP "bundles" might be regarded as an element of capacity planning. I have done this in a couple of places.

>> SECTION-BY-SECTION COMMENTS
>>
> §1. Intro
>> "...a preponderance of Internet traffic tends to originate in one autonomous
>> system and terminate in another," This assertion (inherited from RFC3272) needs
>> an up-to-date reference. I thought the opposite had been true for a decade or
>> more, but I have no hard measurement evidence other than Arbor's study in 2010
>> (Craig Labovitz et al, ACM CCR), which found that the majority of inter-domain
>> traffic was flowing to CDNs, and I figured one could assume that most CDN
>> content would then be served multiple times intra-domain.
> I can find no reference.
> I'm a bit reluctant to delete the text, but I understand the abhorrence of unsubstantiated assertions.

[BB] Worse, when it's likely from the ref I mention that the assertion 
is no longer true.

[AF] Hmm. I found a reference from 1997, but that is not something to quote here.
I did some sniffing and I found that most of the CDN sites I access for everyday business are accessed through my local AS, but they are not part of that AS. That means my traffic is inter-domain even when coming from the "local" data centre.
While I still agree that the assertion would be better with a reference, I think it remains true.

>> §1.2 Components of Traffic Engineering
[snip]
>> "Examples of resources are bandwidth, buffers, and queues,..."
>> A queue is not a resource. The buffer is the resource, and the queue uses it.
> How many queues do you have in your router? Can you assign more queues
> on an interface? Can you assign queues for specific purposes?
> Queues are resources.
>
> No change made.

[BB] I think you mean queue types (i.e. the way at config time that a 
[BB] queue is handled in a buffer), not the run-time queue itself. I'd be 
[BB] happy with 'queue types'

[AF] Can you have multiple queues of the same type (i.e., configured with the same in/out rules and size limits)? If so, then I mean queues not queue types because the each of queue of the same type is an available resource. If not, then I mean queues not queue types because there is no semantic difference between a single queue and its queue type.

>> §2.4.1 Combating the Congestion Problem
[snip]
>> b) TCP is no longer an appropriate byword for 'responsive traffic', and UDP is
>> no longer a byword for unresponsive traffic, both given the growing prevalence
>> of QUIC over UDP (and of SCTP, DCCP). Pls search the draft for multiple
>> occurrences.
>
> I searched the draft for "responsiv" (sic) and this was the only instance of relevance I could find. Is there something else you saw?
>
> Even in this paragraph, I don't see TCP being used as a byword for responsive traffic, not even as an example.
> I do see UDP being used as an example of unresponsive traffic, and I guess the solution here is to just remove the example.

[BB] In this draft, the word TCP is always used to mean 'the congestion 
[BB] control algorithms (CCAs) used within TCP', which are now also used 
[BB] within QUIC and other transport protocols. And real-time protocols use 
[BB] CCAs that are 'friendly; to these CCAs within TCP and QUIC.
[BB]
[BB] So, you need to search for 'TCP' which should be replaced with 
[BB] 'responsive traffic' in every instance where is it used. Possibly 
[BB] 'responsive traffic such as TCP' at the first occurrence.

Well, OK.

>> §5.1.1.2. Differentiated Services
>> ii) Routing control is not /required/ to deliver acceptable service quality - other
>> techniques, e.g. liberal provisioning, can preserve service after shortest path
>> reroutes around failures.
> s/including routing control/such as routing control/

[BB] See last night's email.

[AF] Response on other thread

>> §2.3.1 says endpoint congestion control is not in primary scope. But, surely,
>> if the draft includes this fairly outdated example of purely endpoint
>> coordination across flows, there should be a full sub-section on multipath
>> transport protocols, which are currently used by in-network control as well as
>> endpoint control (rather than subordinating multipath within the section on
>> ATSSS, which is just one in-network example of the use of multipath
>> transports)? Then the ATSSS section could cross-refer to the new multipath
>> subsection instead of having to include it.
>>
>> The idea of multipath L4 transport was originally developed as an improvement
>> over existing TE techniques, whether deployed solely on endpoints, or with
>> in-network control. At minimum, the original rationale for adding a multipath
>> capability to L4 transport protocols should be referenced:
>>
>>     Wischik, D.; Handley, M. & Bagnulo Braun, M. The Resource Pooling Principle
>>     SIGCOMM Comput. Commun. Rev., ACM, 2008, 38, 47-52
>>
>> When I was in BT, an ex-colleague, the late Peter Key, calculated that
>> in-network traffic engineering would become redundant, if at least about 6% of
>> traffic used multipath at L4. (6% is from memory 'cos I can't find his paper on
>> it - pls don't quote it.)
> I don't know what to do with this comment. I have no expertise in this topic and can't convert your thoughts into valid text for the document. I would be happy to receive suggestions of text.

[BB] I'm afraid I have to draw the line at the point between reviewing 
[BB] and becoming a contributor - need to get on with my day job soon

[AF] Then we'll pass on this one.

>> General Comment
>>
>> Jitter?
>> <RANT> In lists of important traffic characteristic (as in the definition of
>> QoS) pls consider replacing 'jitter' with '99th percentile delay' or another
>> high percentile e.g. 99.9th. Jitter was only relevant when many end devices
>> were analogue. Once the vast majority of devices have memory buffers, the only
>> relevant delay metrics characterize the tail of the distribution. In contrast,
>> jitter is overwhelmingly driven by the shape of the body of the delay
>> distribution, which bears no relation to the tail. Because jitter does not and
>> cannot characterize the seriousness of the actual delay that a buffered
>> receiver will play out, it just confuses everyone into seeing problems where
>> there are none, and missing where the real problems are. </RANT>
> Anecdotally, I experience a huge amount of jitter at home in real-time apps. 
> This is, of course, because buffering is not something you want too much of
> or the app ceases to be real time. Thus, variations in delay become very 
> noticeable.
>
> It's not about the percentile of delay. It's about what happens each time the
> delay exceeds the buffering.
>
> I'd be willing to accept that this is not the real description of the problem,
> and that there are real problems not being described, but I'd need to see
> the description of those problems.

[BB] My rant regards the way jitter is defined. If it is used as a lay 
[BB] term for variation, it's better to say delay variation to avoid the 
[BB] un-useful precise meaning of jitter.

Cloudflare's Speed Test tool says, "Jitter is calculated as the average distance between consecutive RTT measurements." [1]
Cisco says, "Jitter is defined as a variation in the delay of received packets." [2]
The Network Encyclopedia says, "In the context of computer networks, packet jitter or packet delay variation (PDV) is the variation in latency as measured in the variability over time of the end-to-end delay across a network."

[1] https://speed.cloudflare.com/ 
[2] https://www.cisco.com/c/en/us/support/docs/voice/voice-quality/18902-jitter-packet-voice.html
[3] https://networkencyclopedia.com/jitter/ 

I think that, in the context of a packet-based world, we can stuck with "the lay term".

>> ==NITS==
>> §2.2 Network Domain Context
>>
>> "This requirement is clarified in [RFC2475] which also provides an architecture
>> for Differentiated Services (Diffserv)." Suggest 'also' is removed.
> Personal style.

[BB] This sounds like RFC2475 is primarily written to clarify this 
[BB] requirement, and incidentally it also happens to provide an architecture 
[BB] for Diffserv.
[BB]
[BB] How about 'This requirement is clarified in the architecture for 
[BB] Differentiated Services (Diffserv) [RFC2475]'

Sure

>> §2.4.1 Combating the Congestion Problem
>>
>> "Many of these adaptive schemes rely on measurement systems." -> "These
>> adaptive schemes rely on measurement systems." [How could an adaptive scheme
>> not rely on measurement?]
> Fixed
>
>> "RED provides congestion avoidance which is not worse than traditional
>> Tail-Drop (TD) queue management." not worse -> better [I don't think the
>> intention was to damn with faint praise].
> I suppose the intent of "not worse than" is "better than or equivalent to", so I used that.

[BB] See last night's email - if the intent was 'not worse than', it was 
[BB] wrong.

Responded on other thread.

>> §7. Inter-Domain Considerations
>>
>> Could mention that L4 multipath transport protocols (whether controlled by
>> endpoints or in-network ) were designed to shift traffic between domains (and
>> they are doing so).
> Yeah. I added a few words. Interestingly, there is still the issue of visibility and
> trust for inter-domain information.

[BB] See last night's email.

[AF] Response on other thread


-- 
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call




[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Mhonarc]     [Fedora Users]

  Powered by Linux