Adrian,
Sorry, I overlooked your later inline responses below before sending my
follow-up review of this new rev, sent last night (27 Jul).
See [BB] below.
On 27/07/2023 17:00, Adrian Farrel wrote:
Hi again Bob,
Once again, many thanks for the thorough review. This time with detailed comments.
IMO, the comments and the nits all need to be fixed as well, but it all
depends how much effort you want to put into perfecting this.
No effort is too much effort to make a perfect RFC.
I haven't pointed out nits already raised by other reviewers.
And it might be hard to believe, but I have tried to limit the number
of comments I've raised!
Ah, shucks, you shouldn't spare our feelings.
==SERIOUS - PLS FIX==
1. The "Congestion Problem"
From a transport area perspective, there is one particiularly glaring omission.
A very large majority of Internet traffic is either capacity-seeking or at
least adaptive to available capacity. This traffic intentionally induces
congestion at the path bottleneck, which is 'good congestion', because it
maximizes capacity utilization and minimizes completion time. So when this
draft repeatedly says that the goal of TE is to combat "The Congestion
Problem", it needs to explain why one part of the IETF is trying to induce
congestion while another (this draft) is trying to combat it.
The explanation is that most network operators design their networks with one
node per-customer (or per-customer-site) as the path bottleneck (or two nodes,
if dual-homed). Then this node (typically the multi-service edge or equivalent)
is where operators can focus deployment of traffic management and control
functions including service differentiation, while other nodes can be
overprovisioned so that they either do not need these functions at all, or they
only need much-simplified functions that complement the primary controls at the
edge node.
Once this context has been explained, the goal of TE is indeed to /avoid/
congestion at all these other nodes, while the goal of endpoints is to /induce/
congestion at their bottleneck node (but only when they have something to send
or receive - the rest of the time they are idle).
Each occurrence of 'congestion problem' will then need to be qualified. Eg:
* "Clearly, congestion is highly undesirable."
* "Congestion is one of the most significant problems in an operational IP
context."
* "If traffic from a source to a destination exceeds the capacity of a link
along the shortest path, the link (and hence the shortest path) becomes
congested while a longer path between these two nodes may be under-
utilized."
[Given latency is important to many / most applications, if throughput is
sufficient, it would be wrong to 'solve' this 'problem' by using the longer
path. The solution would be to minimize the delay that results from congestion
by using the latest queue management techniques.]
Delivering low-latency may also be an aim of TE. The least cost path might not be the lowest latency path, and if you set the metrics to reflect the latency then the least cost path might not have enough bandwidth, and so on. TE is a solution to a multi-faceted problem. And, indeed, "if throughput is sufficient" is exactly the point, isn't it?
But you're right about congestion (although I could take issue with calling building a pipeline as congestion).
What I've done, is to bolster the various descriptions in the terminology section to describe the good and the bad, and I've introduced "network congestion" as a new term. Then, throughout the document, I have referred mainly to "network congestion" since TE is mainly about reducing the impact of congestion within the network. The aim is not to write a thesis on congestion, but to give a steer (sic), and I am worried that more words risks opening a can of worms and lack of precision so I have tried to stay minimal.
[BB] My original point was about the specific para quoted above, which
remains unchanged so it doesn't reflect the multifaceted goals of TE.
Regarding the new defined term 'network congestion', the definition is
sort-of fine, but two problems:
1. it would be better if it said it does /not/ include the normal
congestion that capacity-seeking sources induce.
2. the term 'network congestion' itself uses a pair of words that are
always used to mean both types of congestion, so it doesn't improve matters
How about 'core-network congestion'?
1.1 Definition of Congestion
Quoting 3 instances of similar wording:
* "...traffic incident on the resource exceeds its output capacity over an
interval of time."
* "...for an interval of time, the arrival rate of packets exceeds the output
capacity of the resource."
* "...sustained overload over an interval of time."
These statements define just the extreme of congestion, not all congestion. If
the input exceeded the output for a sustained interval of time, the queue would
very quickly overflow the buffer and there would be sustained high levels of
loss. That is sustained overload, not just 'congestion'.
Congestion controlled traffic ensures that the input and the output are roughly
matched on average, alternating which is highest at a fast timescale. This is
congestion, even though the average queue remains stable. The more congestion
controlled /flows/ there are, the faster the queue cycles, so the loss fraction
is higher. But input and output are still matched on average (contrary to the
definition given).
Even if there is a high proportion of unresponsive traffic alongside the
congestion controlled traffic, input will not exceed output over a sustained
interval unless the sum of the unresponsive traffic /alone/ exceeds capacity.
The normal percentage of unresponsive traffic in today's Internet is probably
under 1% (since QUIC became prevalent, it has not been so easy to measure how
much UDP traffic is unresponsive).
It's difficult to decide, here, whether this figure is achieved because of underlying TE, because of careful policing at the edges, because of tools in the transport layer, because of adequate capacity planning, or a mix of all of these. And we shouldn't get into this discussion on this thread or in this document. (A fascinating topic that is well worth investigation and research, just not today).
If we define congestion, as you seem to do, as any moment when input exceeds output, but including within the bounds of queuing and buffering, that doesn't seem like an issue. As you say: good congestion. So the trick here appears to be to clarify the good and the bad, and then focus on the bad.
I've tried to do this in the new text, but I'd note that the "extreme of congestion" is exactly what we are pursuing with traffic engineering.
[BB] See email last night, re 'beating harder at the door'
==COMMENTS==
GENERAL COMMENTS
2. Current Practice?
§5.1 on "IETF projects related to TE" needs to say that not all these practices
have been widely adopted in practice, and some are still too immature to know
whether they will be adopted. Otherwise, on reaching §8 "Contemporary TE
practices", an obvious question arises: "Well, what were the previous 56 pages
about?" There are sometimes hints in the wording when a technology is not yet
adopted or deployed, e.g. the use of phrases like 'can be useful', 'may be
used', but I guess it's hard to pass judgement on likely deployment.
Fair enough. Added a caveat at the start of 5.1 to make this clear. Since the document will persist (for 20 years like its predecessor?) we have to be a bit careful about describing deployments. We just present the technologies as a compendium.
Content Distribution (§5.2) is the only TE technique included that is not
within the IETF project section.
Section 5.2 is inherited almost verbatim from RFC 3272 (Section 4.7). Personally, I'm not sure it belongs in this document, but there was no consensus to remove it, it's good background, and it is certainly not untrue.
I suggest the following should also be included:
* ECMP. This is described rather disparagingly in §6.2 under routing
recommendations as if it is not good enough. However it is widely used with
n-stage Clos topologies (with n=2 or higher), precisely because it is
considered good enough (i.e. cost-effective) by many major operators.
I don't think the text about ECMP is disparaging. ECMP may be very good and useful in a whole range of networks and at any technology level, but it is not traffic engineering. Certainly not when it is applied (as it mainly is) to parallel single hops: here it is a very effective way of making "fatter pipes".
[BB] ECMP is not TE, only if TE is defined sthg like the quote I gave
from Tom Nadeau (and how you and I understand TE). But ECMP is TE under
the much broader definition of TE in this draft, which even includes
capacity planning. But I understand that you might not want to add to
the grab-bag, even though there are precedents for doing so in the draft
that you also don't want to remove at this stage.
§8 says "Service providers apply many of the TE mechanisms described in this
document..." I think it also needs to be said that many service providers do
not.
Sure.
SECTION-BY-SECTION COMMENTS
§1. Intro
"...a preponderance of Internet traffic tends to originate in one autonomous
system and terminate in another," This assertion (inherited from RFC3272) needs
an up-to-date reference. I thought the opposite had been true for a decade or
more, but I have no hard measurement evidence other than Arbor's study in 2010
(Craig Labovitz et al, ACM CCR), which found that the majority of inter-domain
traffic was flowing to CDNs, and I figured one could assume that most CDN
content would then be served multiple times intra-domain.
I can find no reference.
I'm a bit reluctant to delete the text, but I understand the abhorrence of unsubstantiated assertions.
[BB] Worse, when it's likely from the ref I mention that the assertion
is no longer true.
§1.1 What is Internet TE?
It might be useful to give examples of practices that are /not/ TE. For
instance, "Other functions that regulate the flow of traffic through the
network" surely includes endpoint congestion control algorithms (CCAs), which I
don't think this definition intended to include. I was surprised to find that
capacity planning is categorized as TE - I thought TE was what you do within
capacity constraints. Similarly, I didn't think queue management and scheduling
would be categorized as TE. Given the document doesn't discuss different types
of scheduling at all, and its discussion of AQM is outdated (see later), it
doesn't seem wise to have cast the scope so wide. Also, although these
definitions are preceded by 'In this document' it might be worth saying what
some other definitions of TE exclude. For instance, until reading this draft, I
thought TE was defined as: "Traffic engineering (TE) is a process whereby a
network operator can engineer the paths used to carry traffic flows that vary
from those chosen automatically by the routing protocol(s) in use in that same
network." [Tom Nadeau, "Offline Traffic Engineering", MPLS Network Management
(2003)].
Hah, funny. I did the manuscript review of that book and made many text suggestions to Tom. I think I'd stick with the definition in Tom's book, but I would note that "engineer" is a very open term that encompasses fiddling with the behaviours of elements of the path.
So, please note that the bullet you quote is a bullet about "traffic management" where "The optimization aspects of TE can be achieved through capacity management and traffic management."
I find myself reluctant to add text that starts to list things that are not TE. Where would we end that list?
It would be possible to provide counter examples where the naif reader might think that something is in scope.
I'd need specific and precise text suggestions.
Leaving the AQM discussion until later.
§1.2 Components of Traffic Engineering
This new section seems at odds with the very similar existing section "Traffic
Engineering Process Models" (§3) and particularly §3.1, which opens with "The
key components of the traffic engineering process model are as follows." Why
was it necessary to add §1.2 with similar scope, but different content? Unlike
§3 it doesn't list 'Measurement' or 'Analysis' as a component of TE, which
surely can't be correct. And it implies that resource reservation is the only
path steering approach.
3.1 is a very long way into the document to be defining what TE is, so something like 1.2 is needed at the top of the document.
Actually, the three elements in 1.2 were found to be fundamental and the lack of knowledge of these three elements was probably the main motivation for running this bis.
I'd also say that section 3 is about the process model which seems to be about how you go about using TE, not what TE is. That's why 3.1 includes the element of measurement and analysis. These are the process functions that enable policy, steering, and reservation in appropriate use.
To pick at your details...
I don't think there is any implication that resource reservation is a path steering approach (let alone the only one). I read and re-read the section and I found it very clear that path steering and resource reservation are different things.
No change made.
"Examples of resources are bandwidth, buffers, and queues,..."
A queue is not a resource. The buffer is the resource, and the queue uses it.
How many queues do you have in your router? Can you assign more queues on an interface? Can you assign queues for specific purposes?
Queues are resources.
No change made.
[BB] I think you mean queue types (i.e. the way at config time that a
queue is handled in a buffer), not the run-time queue itself. I'd be
happy with 'queue types'
"...rate-shaping mechanisms that are typically supported via queuing." In
aggregates, using queuing for rate shaping has become less acceptable than
using dropping nowadays.
OK. Well.
The full quote is...
* Resource allocation is the data plane aspect of resource
management. It provides for the allocation of specific node and
link resources to specific flows. Example resources include
buffers, policing, and rate-shaping mechanisms that are typically
supported via queuing.
In other words, that element of rate-shaping that is supported via queuing is "resource allocation." Rate-shaping that is supported by dropping would not count as resource allocation.
The hint here is that the text uses "that" not "which".
No change made
[BB] Subtle but true.
On a related note, traffic mapping is first mentioned in §6.3. Should it not be
mentioned earlier as one of the components of TE?
I see it in 2.4.
However, it should be in 1.4. Thanks for catching this.
§1.3 Scope
It seemed odd to call the subject of the draft 'Internet TE', then define the
scope as intra-domain, given Internet means Internetwork. It might be worth
explaining that this is intended to mean 'TE of the Internet service' not 'TE
of the Internet'.
I think you are right to call this out. And I believe the answer is that the scope would be inter-domain (or multi-domain) TE if it was available, but it isn't. So the text should add, "...because this is the practical level of TE technology that exists in the Internet at the time of writing."
The draft focuses nearly exclusively on MPLS as the transport technology
(Ethernet transport is mentioned a couple of times, but only in passing). It
might be worth saying this in the scope section.
I contest that. I think that the document barely mentions transport technologies, but acknowledges MPLS as an IP technology.
§2.4.1 Combating the Congestion Problem
Under short timescale, a lengthy passage about AQM is provided (and no other
examples of short timescale technologies are given). This seems out of place at
this point, where no other technology is described in such depth. It would seem
more consistent to have a subsection on AQM in §5, and refer forward to it from
here. Having said that, the text on AQM is outdated in three respects: RED, TCP
and LQD.
a) RFC7567 (which is a BCP and cited in this section) effectively deprecates
RED, as follows:
"With an appropriate set of parameters, RED is an effective algorithm.
However, dynamically predicting this set of parameters was found to
be difficult. As a result, RED has not been enabled by default, and
its present use in the Internet is limited. Other AQM algorithms
have been developed since RFC 2309 was published, some of which are
self-tuning within a range of applicability. Hence, while this memo
continues to recommend the deployment of AQM, it no longer recommends
that RED or any other specific algorithm is used by default. It
instead provides recommendations on IETF processes for the selection
of appropriate algorithms, and especially that a recommended
algorithm is able to automate any required tuning for common
deployment scenarios."
However, it is understandable that this draft needs to introduce RED, because
it is the basis of WRED, which this text is working towards introducing as part
of Diffserv AF. So it would be perhaps best to say sthg like:
"RFC7567 recommends self-tuning AQM algorithms like those that the IETF has
since published [RFC8290, RFC8033, RFC8034, RFC9332], but RED is still
appropriate for links with stable bandwidth, if configured carefully."
That's a nice edit, thanks.
b) TCP is no longer an appropriate byword for 'responsive traffic', and UDP is
no longer a byword for unresponsive traffic, both given the growing prevalence
of QUIC over UDP (and of SCTP, DCCP). Pls search the draft for multiple
occurrences.
I searched the draft for "responsiv" (sic) and this was the only instance of relevance I could find. Is there something else you saw?
Even in this paragraph, I don't see TCP being used as a byword for responsive traffic, not even as an example.
I do see UDP being used as an example of unresponsive traffic, and I guess the solution here is to just remove the example.
[BB] In this draft, the word TCP is always used to mean 'the congestion
control algorithms (CCAs) used within TCP', which are now also used
within QUIC and other transport protocols. And real-time protocols use
CCAs that are 'friendly; to these CCAs within TCP and QUIC.
So, you need to search for 'TCP' which should be replaced with
'responsive traffic' in every instance where is it used. Possibly
'responsive traffic such as TCP' at the first occurrence.
c) Even at the time RFC3272 was written, it says LQD was theoretical. If a
deployed ref would be preferred, I suggest AFD, which was implemented by cisco,
at least, and is still available AFAICT. Pan, R.; Breslau, L.; Prabhaker, B. &
Shenker, S. "Approximate Fairness Through Differential Dropping" ACM SIGCOMM
Computer Communication Review, 2003, 33, 23-40
I've added rather than replaced. Thanks for the reference.
§5.1. Overview of IETF Projects Related to Traffic Engineering
After §5.1.1 on Intserv (or after the section on scalability within it), it
would surely be worth mentioning Pre-Congestion Notification (PCN)
https://datatracker.ietf.org/wg/pcn/documents/ , which solves the scaling
problems of Intserv by using measurement-based admission control (and flow
termination to handle failures) between edge-nodes. Nodes between the edges of
the internetwork have no per-flow operations and the edge nodes can use RSVP
per-flow or per-aggregate. It was implemented by a number of equipment vendors.
That's good. Thanks.
I used RFC 5559 as the base reference.
Also, in §5.1.5 on DETNET, it would be worth mentioning that DETNET would
suffer from the same scaling problems described in the Intserv section, but
DETNET's domain of applicability is considered small enough for this to be
acceptable.
OK
§5.1.1.2. Differentiated Services
"The Diffserv model deals with traffic management issues on a per hop basis.
... Other TE capabilities, such as capacity management (including routing
control), are also required in order to deliver acceptable service quality in
Diffserv networks"
The above-quoted para is problematic:
i) Diffserv is not solely per-hop (which contradicts the complementary mix of
domain edge and per-hop functions explained 2 paras earlier).
Our working definition of "traffic management" is stated as...
traffic management includes:
1. Nodal traffic control functions such as traffic conditioning,
queue management, and scheduling.
2. Other functions that regulate the flow of traffic through the network
or that arbitrate access to network resources between different
packets or between different traffic flows.
So, I would agree that the network edge plays a part in DiffServ (as, indeed, does the network wide planning of the impact of DiffServ) as mentioned in the paragraph 2 earlier. But this paragraph is talking about "traffic management" and it appears (to me) that in that context and within a domain, DiffServ is handling traffic management at each hop and with no coordination between hops.
But we can add a little clarity to this by mentioning the role of the edge.
ii) Routing control is not /required/ to deliver acceptable service quality - other
techniques, e.g. liberal provisioning, can preserve service after shortest path
reroutes around failures.
s/including routing control/such as routing control/
[BB] See last night's email.
§5.1.1.4. (Layer-4) Transport-Based TE
When the draft says no support for ATSSS splitting has yet been developed for
QUIC, it would be worth explaining why (e2e cryptographic protection), and
possibly referencing multipath QUIC [draft-ietf-quic-multipath]. It seems
rather odd to say so much about QUIC (which ATSSS does not support) and so
little about MPTCP (which ATSSS does use).
Taking Med's solution to this per his email.
§5.1.2.3. Network Slicing
This section seems out of scope or at best aspirational - should it be deleted?
It admits itself that "IETF network slices are not, of themselves, TE
constructs. However, a network operator that offers IETF network slices is
likely to use many TE tools in order to manage their network and provide the
services." Further, it doesn't point to any work on how this might be done,
particularly what information visibility would be necessary to coordinate
multiple slices.
What can I say? This is a subsection of "IETF Approaches Relying on TE Mechanisms" so what you quote is consistent with where the subsection is located.
[BB] Good point.
Is IETF network slicing aspirational? draft-ietf-teas-ietf-network-slices is on the next IESG telechat. draft-ietf-teas-enhanced-vpn has completed working group last call. A number of solutions based on pre-existing TE concepts are being progressed (mainly in TEAS, but with work in SPRING and LSR). There are several documents explaining how crucial IETF network slicing is for 5G deployments.
§5.1.3.7. Flow Measurement
The RTFM WG concluded in Oct 2000. Should the draft not discuss IPFIX (the open
standards development of Cisco's Netflow), which ran from 2001-2015? See the
comparison with RTFM at
https://datatracker.ietf.org/doc/html/rfc5472#section-3.6 Since that
comparison, IPFIX was developed a lot further too; see
https://datatracker.ietf.org/group/ipfix/documents/ .
Yes, it is worth mentioning IPFIX in this section and referring to 5472. As 5472 says, the two architectures are sympathetic, but you are right that the IPFIX protocol is significantly implemented.
I'm also including a reference to RFC 7011
§5.1.3.8. Endpoint Congestion Management (and the omission of a section on
multipath L4 transports)
§2.3.1 says endpoint congestion control is not in primary scope. But, surely,
if the draft includes this fairly outdated example of purely endpoint
coordination across flows, there should be a full sub-section on multipath
transport protocols, which are currently used by in-network control as well as
endpoint control (rather than subordinating multipath within the section on
ATSSS, which is just one in-network example of the use of multipath
transports)? Then the ATSSS section could cross-refer to the new multipath
subsection instead of having to include it.
The idea of multipath L4 transport was originally developed as an improvement
over existing TE techniques, whether deployed solely on endpoints, or with
in-network control. At minimum, the original rationale for adding a multipath
capability to L4 transport protocols should be referenced:
Wischik, D.; Handley, M. & Bagnulo Braun, M. The Resource Pooling Principle
SIGCOMM Comput. Commun. Rev., ACM, 2008, 38, 47-52
When I was in BT, an ex-colleague, the late Peter Key, calculated that
in-network traffic engineering would become redundant, if at least about 6% of
traffic used multipath at L4. (6% is from memory 'cos I can't find his paper on
it - pls don't quote it.)
I don't know what to do with this comment. I have no expertise in this topic and can't convert your thoughts into valid text for the document. I would be happy to receive suggestions of text.
[BB] I'm afraid I have to draw the line at the point between reviewing
and becoming a contributor - need to get on with my day job soon
§6.1. Generic Non-functional Recommendations
Stability: I suspect this para might be talking about flap, but I don't believe
I've seen the problem spelled out, so perhaps it should be here. That is, when
endpoint CCAs (or TE systems in neighbouring domains) interact with TE such
that, when the TE of one domain moves an aggregate, CCAs rapidly restore the
original imbalance, possibly causing the TE system to flap.
Yes. While stability is introduced in 1.1, and defined in 1.4, it is not discussed elsewhere. I will clarify "stability" by reincluding the definition at this point.
§9. Security Considerations
Shouldn't it be said here that external control interfaces (e.g. ALTO and the
other approaches in §5.1.2) have to trade off providing flexibility to
customers with opening up control of a network's internals to potentially
malicious actors.
That's a nice point.
General Comment
Jitter?
<RANT> In lists of important traffic characteristic (as in the definition of
QoS) pls consider replacing 'jitter' with '99th percentile delay' or another
high percentile e.g. 99.9th. Jitter was only relevant when many end devices
were analogue. Once the vast majority of devices have memory buffers, the only
relevant delay metrics characterize the tail of the distribution. In contrast,
jitter is overwhelmingly driven by the shape of the body of the delay
distribution, which bears no relation to the tail. Because jitter does not and
cannot characterize the seriousness of the actual delay that a buffered
receiver will play out, it just confuses everyone into seeing problems where
there are none, and missing where the real problems are. </RANT>
Anecdotally, I experience a huge amount of jitter at home in real-time apps. This is, of course, because buffering is not something you want too much of or the app ceases to be real time. Thus, variations in delay become very noticeable.
It's not about the percentile of delay. It's about what happens each time the delay exceeds the buffering.
I'd be willing to accept that this is not the real description of the problem, and that there are real problems not being described, but I'd need to see the description of those problems.
[BB] My rant regards the way jitter is defined. If it is used as a lay
term for variation, it's better to say delay variation to avoid the
un-useful precise meaning of jitter.
==NITS==
§1.1 What is Internet TE?
"...utilizing network resources without waste": too strong; how about "without
excessive waste"?
Ack
"while reacting to the real-time statistical behavior of traffic" -> "while
reacting to statistical measures of the real-time behavior of traffic"
Ack
in the later case -> latter
Nice
§1.2 Components of Traffic Engineering
Standard TE solutions -> TE solutions
Ack
§1.4. Terminology
Please include explanations of the following terms, which have confusable
alternative meanings:
* 'global' is invariably used (17 occurrences) in the sense of domain-wide
(although it clearly means globe-wide in phrases like 'the global Internet
infrastructure', "global network provider" and "globally interconnected
network"). The phrase 'global synchronization' is an exception to both cases.
Yeah.
Common usage is not good with this word.
I have tried to fix the document with various changes of "domain-wide", "network-wide", "world-wide", <delete because meaningless hyperbola>, "full", etc.
However, I left Section 4.4 and references to Global Concurrent Optimization cos that's a thing with an RFC. Also "global" in the context of SR SIDs because that is a thing, and there is both an explanation and a reference.
* 'end-to-end' is used in the sense of edge-to-edge, contrary to the common
IETF (or at least transport and application area) use of the phrase meaning
application endpoint to application endpoint.
Well, it is slightly complicated, and there is a lot of material out there "contrary to the common IETF use". And the distinction is used deliberately in, for example, "end-to-end protection" where the full length of the path is protected in contrast to protecting segments of the path between the edges of protection domains. But, in that example, the path runs between the edges of the TE domain and not between the original source and the ultimate destination. In short, there is a hierarchy and one person's edge is another person's end.
I have cleaned out uses of "end-to-end" where they were not necessary, and I have added some (waffle) text as a definition.
* 'transport' is generally used to mean below IP, as one would expect in a TE
document. But it is also used to describe L4 protocols, e.g. the title and
content of §5.1.1.4. "Transport-Based TE" and in §5.1.3.8. "Endpoint Congestion
Management". I suggest 'layer-4 transport' is used in these latter cases.
Ah, good.
And some comments on existing definitions:
* Effective bandwidth - although the definition given is not incorrect it is
not as precise as the mathematical definition, so perhaps a reference should be
added, e.g.
F. Kelly. Notes on effective bandwidths. In Stochastic Networks: Theory and
Applications. Oxford University Press, 1996.
OK
* Hot-spot - better defined as an element or sub-system in a considerably
higher state of congestion /than others/.
OK
* Inter-domain is defined, but intra-domain is not (perhaps both are obvious
and neither is needed?).
Right, and there is a whole section explaining inter-domain.
* Offline/online TE are defined as "exists outside of/within the network", but
surely they're defined by when they operate, not where (e.g. online TE can be
located outside the network, e.g. SDN).
I would have thought so, too.
But the text is consistent in its usage, and the TE literature seems to agree.
One of the reasons why it is important to have a definitions section!
* Traffic flow: It defines flow as between 2 endpoints, and says a common
classification for a flow is a 5-tuple. However, wherever 'flow' is used in the
body of the document, I'm pretty sure it is more likely to be intended to mean
an aggregate flow. So the way this definition is worded has great potential for
confusion.
Clarified.
Flow-size distribution.
On a related point, it would be useful to explain that the very large majority
of 5-tuple flows are very brief (mice - indeed single-packet flows massively
predominate). So a number of TE techniques are designed to shift elephant flows
around, or to shift aggregates likely to contain elephants. The practicality of
numerous technologies described in this draft depends heavily on the definition
of 'flow'.
Clarified.
* southbound - perhaps needs a definition?
The term is no longer used in the document, so I haven't included a definition.
§2.2 Network Domain Context
"This requirement is clarified in [RFC2475] which also provides an architecture
for Differentiated Services (Diffserv)." Suggest 'also' is removed.
Personal style.
[BB] This sounds like RFC2475 is primarily written to clarify this
requirement, and incidentally it also happens to provide an architecture
for Diffserv.
How about 'This requirement is clarified in the architecture for
Differentiated Services (Diffserv) [RFC2475]'
§2.4 Solution Context
"A collection of online and possibly offline tools and mechanisms for
measurement, characterization, modeling, and control of traffic, and control
over the placement and allocation of network resources, as well as control over
the mapping or distribution of traffic onto the infrastructure."
This list needs to be split, 'cos measurement cannot be offline.
Fixed
§2.4.1 Combating the Congestion Problem
"Many of these adaptive schemes rely on measurement systems." -> "These
adaptive schemes rely on measurement systems." [How could an adaptive scheme
not rely on measurement?]
Fixed
"RED provides congestion avoidance which is not worse than traditional
Tail-Drop (TD) queue management." not worse -> better [I don't think the
intention was to damn with faint praise].
I suppose the intent of "not worse than" is "better than or equivalent to", so I used that.
[BB] See last night's email - if the intent was 'not worse than', it was
wrong.
"RED reduces the possibility of global synchronization where retransmission
bursts become synchronized across the whole network" Global synchronization
only means synchronization between all the flows sharing the same bottleneck,
not the whole network. Also it's primarily about synchronization of the
sawtoothing window variations of each flow; retransmissions will not
synchronize unless the paths all have the same RTT.
Reduced to...
Importantly, RED reduces the
possibility of retransmission bursts becoming synchronized within the network
"All the policies described above for the long and medium time scales can be
categorized as being reactive." Given the shorter the timescale, the more it is
likely that a solution will be reactive, this odd choice of sentence conveys
the opposite impression, even though it is strictly not incorrect. For
instance, long timescale activities like capacity expansion certainly /can/ be
reactive in theory, but normally they are preventative.
Fundamental typo!
It should say "short and medium"
§2.5 Implementation and Operational Context
"The operational context of Internet TE is characterized by constant changes
that occur at multiple levels of abstraction." I think this intended to say
multiple levels of granularity [something can be described or modelled at a
level of abstraction, but surely real changes do not occur at a level of
abstraction]. Similarly, in § 3.1 "Measurement in support of the TE function
can occur at different levels of abstraction." -> granularity
I understand your point. There is quite a lot use of "abstraction" in the TE literature (for example RFC 6805).
"Granularity" doesn't feel right because it feels like "big change or little change".
§4.1 Time-Dependent Versus State-Dependent Versus Event-Dependent
"learning models, as in the success- to-the-top (STT) method" [needs a ref &
perhaps a brief description]
Added RFC 6601
"a fully functional TE system is likely to use all aspects of time-dependent,
state-dependent, and event-dependent methodologies as described in Section
4.1." [Shouldn't this point be made in §4.1?]
Can do. Not sure it is critical, but why not?
§4.3.2. Considerations for Software Defined Networking
"...SDN control plane often determines the end-to-end path ..."
[end-to-end implies more than intra-domain - seems too strong]
Per discussion of e2e, above, this now s/end-to-end path/path/
§4.6. Open-Loop Versus Closed-Loop
"feedback information may be in the form of historical information"
[Surely that would not be described as closed loop?]
I think this was push-back against "current" implying a very tight loop, which is often impractical.
Needed to allow information that has been gathered in the near past.
Changed to:
in the form of current measurement or recent
historical records
§5.1.2.1. Application-Layer Traffic Optimization
"...allows a network to publish its network information such as network
locations, costs between them at configurable granularities, and end-host
properties to network applications."
This sentence is hard to parse. Perhaps change the order and/or flag the items
in the list with a), b), c).
Fixed.
§5.1.2.2. Network Virtualization and Abstraction
"statistical packet bandwidth" [does this mean some form of effective
bandwidth?]
It contrasts with various transport technologies where bandwidth assignments are absolute and distinct as forwarding plane resources.
§5.1.3.2. RSVP
"RSVP has been extended to reserve resources for aggregation of flows"
[Cite RFC3175?]
Yes
§5.1.3.4. RSVP-TE
"...the paths of LSPs" -> LSPs [the P already stands for path]
Yeah. Trying to say the "route followed by the LSP" without using "route". But you're probably right about what to do here.
"To determine the paths for P2MP LSPs, selection of the branch points (based on
capabilities, network state, and policies) is key." [This problem is left
dangling. Is there at least a reference giving a solution?]
Yup. 5671.
§5.1.3.5. Generalized MPLS (GMPLS)
"TE extensions to MPLS (see Section 5.1.3.3)." -> 5.1.3.4
Yup
"These additions impact basic LSP properties: ..."
[Again, this problem is left dangling. Is there at least a reference giving
solutions to the ensuing list of apparently fundamental problems?]
Yes. This section is strangely lacking in references. Added some.
§5.1.3.12. Segment Routing
"...global context (network wide)" [this potentially gives the impression of an
Internet-wide lookup capability]
Yup. Should say "global context (domain wide)" to match with the terms used by the SR documents.
"BIER-TE does not of itself offer traffic engineering..." -> BIER-TE does not
offer a complete traffic engineering system... [Rather misleading - perhaps
better would be to move the sentence from the next para here: "...steers the
traffic within the network and forms an element of a traffic engineering
system."]
Ack
§6. Recommendations for Internet Traffic Engineering
The order of some of the sub-sections seems odd (e.g. 'measurement' and
'traffic mapping' after 'routing') and contrary to more logical ordering of the
TE process model in §3.
There seem to be lots more categories in 6 than in 3.
No change.
§6.1. Generic Non-functional Recommendations
"...a TE system should remain functional as the network expands with regard to
the number of routers and links, and with respect to the traffic volume."
[traffic volume -> number of flows (I don't believe the same number of flows
but carrying more volume would impact TE scaling at all)]
Well, it should remain functional in that case, shouldn't it?
So I will change this to "...number of flows and the traffic volume."
§6.4 Measurement Recommendations
hot-spot -> hot spot [consistent with 2 other occurrences]
According to another review, everything is now "hot-spot"
§6.5. Policing, Planning, and Access Control
"This is a simple way to check that the actual traffic volumes are consistent
with the planned volumes." [check -> enforce/ensure]
Ack
§6.6. Network Survivability
"Network capacity reserved in one layer to provide protection and restoration
is not available to carry traffic in a higher layer: it is not visible as spare
capacity in the higher layer." [Unclear whether these are statements of fact or
recommendations. Perhaps 'is' -> 'should be' (twice)]
It is fact, and the recommendation only comes in the final sentence of the bullet paragraph.
§7. Inter-Domain Considerations
a are -> are
Ack
"it is generally considered inadvisable for one domain to permit a control
process from another domain to influence the routing and management of traffic
in its network." [Surely this is inescapable, if TE in one domains moves
traffic to a different ingress of the next domain. Or is this intended to mean
'Don't open up an explicit control interface for other domains"?]
It's slightly more complicated than that. Imagine a source routing mechanism (such as SR): it is inadvisable to permit a packet entering a domain from another domain to specify explicit hops within the new domain. Consider a signaling protocol (such as RSVP-TE): it is not a good idea for a path establishment message that originates in one domain to explicitly identify hops in a second domain.
But, of course you're right that the traffic steering in one domain could have a significant impact on the load of the second domain, and that would change the TE options available and the actions that are needed. That is not a *direct* influence on the routing and management, just an effect on the load.
I think that no change is needed.
Could mention that L4 multipath transport protocols (whether controlled by
endpoints or in-network ) were designed to shift traffic between domains (and
they are doing so).
Yeah. I added a few words. Interestingly, there is still the issue of visibility and trust for inter-domain information.
[BB] See last night's email.
§8. Overview of Contemporary TE Practices in Operational IP Networks
This section presents rather a large wall of unbroken text. Navigation markers
would be useful, perhaps based on the list in the 2nd para (altho I couldn't
see them all in the text).
Not a lot to do here, but I have tried a little formatting to break up the wall.
§13. Informative References
[Floyd94] -> [RFC3168]
Ah, thanks for that
§A.2 This Document
§5.1.3.4 & §5.1.3.13 are missing.
Nice
General (multiple sections)
I found a number of cases of repetition - probably just a symptom of age (of
the draft, not the editor ;)
Don't knock age. It's what got me this far.
Some of the repetitions are for clarity in a long document. Others are just poor editing.
* The definition of congestion was given 3 times, as already listed earlier;
I think this is solved as above.
Repeated explanation of the distinction between reactive and
proactive/preventative:
* "...can be both pro-active and reactive. In the pro-active case, the TE
control system takes preventive action..."
* Reactive Versus Preventive Congestion Management Schemes (the main bullet
item on this distinction)
* Network performance optimization can be corrective or perfective. In
corrective optimization,... In perfective...
* Prescriptive TE can be further categorized as either corrective or
perfective. Corrective TE prescribes ... Perfective TE...
It's a bit heavy, but I think it all helps.
(BTW, no hyphen in proactive.)
Ack
Extensions to link-state routing protocols are repeatedly listed and explained:
* "Examples of protocol extensions used to advertise network link state
information are defined in [RFC5305], [RFC6119], [RFC7471], [RFC8570], and
[RFC8571]."
* "taking into consideration the prevailing network state as advertised by IGP
extension for IS-IS in [RFC5305], for OSPFV2 in [RFC3630], and for OSPFv3 in
[RFC5329]"
* "[RFC5305] describes the extensions to the Intermediate System to
Intermediate System (IS-IS) protocol to support TE, similarly [RFC3630]
specifies TE extensions for OSPFv2, and [RFC5329] has the same description for
OSPFv3."
* "A number of enhancements to the link state IGPs allow them to distribute
additional state information required for constraint-based routing. The
extensions to OSPF are described in [RFC3630], and to IS-IS in..."
Yeah, I noticed this when editing. My feeling was that if it was a single document (rather than a suite) that referenced, it would not stick out at all.
The places are fairly far apart in the document. So, for example, the four references to RFC 5305 are on lines 1337, 2192, 2329, and 2815.
So I class this as inelegant, but survivable.
I found the number of sentences that gratuitously contained 'X or not X'
started to irritate. A taxonomy will naturally divide practice into
alternatives, but there were a number of cases where such phrases were not used
to divide up the taxonomy, but seemed to be just woffle that could be removed
completely without subtracting anything. Some examples (some are only
marginally gratuitous):
Some TE solutions rely on these elements to a lesser or greater extent.
This determination may be made at a very coarse or very fine level.
Metrics that provide quantitative or qualitative measures
may allow the settings of the traffic control mechanisms to be manipulated by
external or internal entities Delivery requirements of a specific set of
packets may be specified explicitly or implicitly. derivation of solutions
which may be implicitly or explicitly formulated This process model may be
enacted explicitly or implicitly An SLA may explicitly or implicitly specify a
Traffic Conditioning Agreement using a set of shared or dedicated network
resources
Yup. And if I had a professional editor, no doubt they would punish this style.
I'd like to say it's inherited from 3272, but it is substantially in the new text. So we have to blame the author set and this editor.
But, I have run out of energy to fix this sort of style issue.
[BB] Understandable
Super thanks for the amount of work you put into this review.
[BB] And thank you for taking it all in such a constructive spirit.
Cheers
Bob
Best,
Adrian
_______________________________________________
Tsv-art mailing list
Tsv-art@xxxxxxxx
https://www.ietf.org/mailman/listinfo/tsv-art
--
________________________________________________________________
Bob Briscoe http://bobbriscoe.net/
--
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call