Reviewer: Dale Worley Review result: Ready with Nits I am the assigned Gen-ART reviewer for this draft. The General Area Review Team (Gen-ART) reviews all IETF documents being processed by the IESG for the IETF Chair. Please treat these comments just like any other last call comments. For more information, please see the FAQ at <http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>. Document: draft-ietf-teas-gmpls-resource-sharing-proc-06 Reviewer: Dale R. Worley Review Date: 12 Jan 2017 IETF LC End Date: 17 Jan 2017 IESG Telechat date: 2 Feb 2017 Summary: This draft is basically ready for publication, but has nits that should be fixed before publication. There are various places where the wording of the draft is unclear. The draft would benefit from a careful editing for clarity. Particularly, there are a considerable number of places where the use of "the" and "a" and of plurals is not standard or leaves the text somewhat uncertain. There are various references to ASSOCIATION objects, SESSION_ATTRIBUTE objects, etc. The text leaves it unclear where these objects live; it talks as if they exist in an abstract sense. I think I managed to track down what is going on in RFC 4872, which is that the Path message that sets up an LSP contains an array of objects and all of the objects described are parts of the respective LSP setup Path messages. I also suspect that the Path message objects are retained by the various nodes as permanent information about the LSPs that they have configured, so one can speak unambiguously of "the ASSOCIATION object of the LSP" long after the LSP is set up. If all of this is correct, it would help the naive reader if this was spelled out at the beginning of the document and/or the wording was changed in places provide this context. E.g., GMPLS LSPs can share resources during LSP setup if they have Shared Explicit (SE) flag set in their SESSION_ATTRIBUTE objects and: could be clarified as GMPLS LSPs can share resources during LSP setup if they have Shared Explicit (SE) flag set in the SESSION_ATTRIBUTE objects in the Path messages that create them and: There are a number of terms that are unclear to me. It's possible that they have very standard meanings in GMPLS or traffic engineering, though. Is there a terminology section in a referenced RFC that could be pointed to to define these various words? 1. Introduction to setup Label Switched Paths (LSPs) in non-packet transport The form "set up" is a verb, whereas "setup" is a noun (naming an instance of the action of setting up) or an adjective (specifying that something has to do with setting up). So in this instance, the wording should be "set up". Other uses of "setup/set up" should be checked also. As described in [RFC6689], an ASSOCIATION object can be used to identify the LSPs for restoration using Association Type set to "Recovery" [RFC4872] and also identify the LSPs for resource sharing using Association Type set to "Resource Sharing" [RFC4873]. The ordering of the phrases in this sentence is somewhat confusing because "using Association Type set to xxx" is a qualifier of "an ASSOCIATION object", yet the phrase "can be used to yyy" is between them. Clearer to say: As described in [RFC6689], an ASSOCIATION object with Association Type "Recovery" [RFC4872] can be used to identify the LSPs for restoration. Also, an ASSOCIATION object with Association Type "Resource Sharing" [RFC4873] can be used to identify the LSPs for resource sharing. -- Generally GMPLS end-to-end recovery schemes have the restoration LSP signaled after the failure has been detected and notified on the working LSP. Is "signaled" used here in a standard way for GMPLS? It seems that "the LSP is signaled" is to mean "the LSP is set up", but it took me some time to realize that. I am used to "X is signaled" meaning "a signal is sent to X". (There are many instances of this usage.) It would also be useful for the reader to know the difference between "protection", "restoration", and "recovery". I think that "protection" is anti-failure paths set up *before* any failure, "restoration" is anti-failure paths set up *after* a failure, and "recovery" includes both "protection" and "restoration". Is this standard terminology withing GMPLS, or should the reader be warned about it? In non-packet transport networks, as working LSPs are typically signaled over a nominal path, What is the meaning of "nominal" here? ("nominal" has a number of meanings, some of which are largely contradictory.) can be reverted to the nominal path when the failure is repaired In this context, the meaning of "reverted" is made clear by the clause "when the failure is reparied..." -- as opposed to other uses of "reverted". In this document, procedures are reviewed for It's probably better to say "we review procedures for...". o When using end-to-end recovery with revertive mode, methods for LSP reversion and resource sharing are summarized in this document. A definition of "revert/revertive/reversion" would be useful. 2. Overview The GMPLS end-to-end recovery scheme, as defined in [RFC4872] and being considered in this document, "fully dynamic rerouting switches normal traffic to an alternate LSP that is not even partially established only after the working LSP failure occurs. The new alternate route is selected at the LSP head-end node, it may reuse resources of the failed LSP at intermediate nodes and may include additional intermediate nodes and/or links". It is awkward to visually coordinate the quotation marks in this paragraph. If it is important that the text is quoted from RFC 4872, given its length, it should be presented as a block-quote. If not, the quotation marks should be omitted and just the reference given. If the intention is to quote this text, it should be corrected so that it matches the passage from RFC 4872. In particular, the difference between "fully dynamic rerouting" (in the draft) and "Full LSP rerouting (or restoration)" needs to be resolved, as there might be a difference in meaning. The grammar does not join "The GMPLS end-to-end recovery scheme ..." and "... fully dynamic rerouting switches normal traffic". Perhaps something like: The GMPLS end-to-end recovery scheme, as defined in [RFC4872] and being considered in this document, switches normal traffic to an alternate LSP that is not even partially established only after the working LSP failure occurs. The new alternate route is selected at the LSP head-end node, it may reuse resources of the failed LSP at intermediate nodes and may include additional intermediate nodes and/or links. -- Two examples, 1+R and 1+1+R are described in the following sections. At this point in the text, it's not clear what category these items are examples *of*. They aren't single recovery situations, as one would expect of something labeled "example". They seem to be sub-categories of "The GMPLS end-to-end recovery scheme". So it would be better to use phrasing like "Two forms of end-to-end recovery, ..., are described in the following sections." or "Two end-to-end recovery schemes/situations ...". I assume that other variants of end-to-end recovery exist, and this draft is applicable to some/many/all of them. To guard against misunderstanding, it would be worth saying so by adding something like "Many other forms of end-to-end recovery exist, many of which [or whatever] can use these RSVP-TE signaling techniques." Given that sections 2.1 and 2.2 form a pair of examples, it might be useful to distinguish them from "Resource Sharing By Restoration LSP" (which is not an example, and is not somehow an alternative to 1+R and 1+1+R) by renumbering the sections to: 2. Overview 2.1. Examples 2.1.1. 1+R Restoration 2.1.2. 1+1+R Restoration 2.2. Resource Sharing By Restoration LSP In that case, the introductory sentence "Two examples..." would move to the new section 2.1. Where do the names "1+R" and "1+1+R" come from and do they have meaning beyond being arbitrary labels? Also, given that the 1+1+R case is split into four sub-cases, it's not clear that the split between 1+R and 1+1+R is fundamental. It seems that there is an array of semi-independent choices: whether there is an ongoing protection LSP, how many restoration LSPs may be established (no more than the number of ongoing LSPs), how many failures of original LSPs must happen before restoration LSPs are established; various combinations of these choices yield various restoration techniques. Looked at that way, it might be worth combining both examples into one. But that has the problem that figure 2 looks considerably different from figure 1. OTOH, figure 2 isn't particularly accurate for the situation with two restoration LSPs, and perhaps those two cases should be split into another section with its own figure. 2.1. 1+R Restoration Unlike a protection LSP, a restoration LSP is signaled per need basis. Is "restoration" a standard word in this field? If not, there should be some sort of terminology section that states clearly the difference between "protection" and "restoration". 2.2. 1+1+R Restoration This paragraph could use rewording to be clearer: After a failure detection and notification on a working LSP or protecting LSP, a third LSP on path A-H-I-J-Z is established as a restoration LSP. Since the working LSP has already been described, this should be "the working LSP". The restoration LSP in this case provides protection against a second order failure. It would probably be better to explain what the "second order failure" is: The restoration LSP in this case provides protection against failure of both the working and protecting LSPs. -- During failure switchover with 1+1+R recovery scheme, in general, failed LSP resources are not released so that working, protecting and restoration LSPs coexist in the network. Nonetheless, a restoration LSP with the working LSP it is restoring as well as a restoration LSP with the protecting LSP it is restoring can share network resources. For ease of reading, better to split the two cases apart, and not use "it is restoring" as we haven't introduced "restore" as a transitive verb: The restoration LSP can share network resources with the working LSP, and it can share network resources with the protecting LSP. -- Typically, restoration LSP is torn down when the failure on the original (working or protecting) LSP is repaired and the traffic is reverted to the original LSP. Strictly, Typically, the restoration LSP is torn down when both the working and protecting LSPs are repaired and the traffic is reverted to the original LSP. Except that's not correct, either. Probably the practice is that a restoration LSP is torn down when enough original LSPs are repaired to bring the failure count below the threshold that triggered the setting up of the restoration LSP (which varies among the four models). But that's awkward to write, even though that is the correct statement. -- In all models discussed, if the restoration LSP also fails, it is torn down and a new restoration LSP is signaled. You can't say "the restoration LSP" because some of the models have more than one. Better In all these models, if a restoration LSP also fails, it is torn down and a new restoration LSP is signaled. 2.3. Resource Sharing By Restoration LSP it allows for resource sharing when the LSP traffic is dynamically restored after the link failure The significance of this phrase isn't clear to me. One possible sense is that since the failure that is being discussed is the C-D link failure, then necessarily the resources from A to C can be reused. But that meaning doesn't work well here, because we haven't introduced what the failure is. (Also, you use the phrase "the link failure" before introducing what the link failure is.) It seems like the potential for resource sharing is a property of the LSP that it might not have, but the text doesn't point that out clearly as an assumption of the example. Perhaps Using the network shown in Figure 3 as an example, LSP1 (A-B-C-D-E) is the working LSP, and assume it allows for resource sharing when the LSP traffic is dynamically restored. -- In this case, A-B-C-F-G-E is chosen as the restoration LSP path and the resources on the path segment A-B-C are re-used by this LSP when the working LSP is not torn down (e.g. in 1+R recovery scheme). "when" isn't the right word here, because the re-using the resources doesn't wait for the working LSP to be not torn down. Perhaps: In this case, A-B-C-F-G-E is chosen as the restoration LSP path and the resources on the path segment A-B-C are re-used by this LSP. The working LSP is not torn down. 3.1. Restoration LSP Association For example, when a restoration LSP is signaled for a failed working LSP, the ASSOCIATION object in the restoration LSP contains the Association ID and Association Source set to the Association ID and Association Source signaled in the working LSP for the "Recovery" Association Type. As a general question, where does the association object live? Clearly it isn't "in the restoration LSP". It would be useful to mention this for readers who aren't fully familiar with the background: For example, when a restoration LSP is signaled for a failed working LSP, the ASSOCIATION object in the Path message that establishes the restoration LSP contains ... 3.2. Resource Sharing-based Restoration LSP Setup As described in [RFC3209], Section 2.5, the purpose of make-before- break is "not to disrupt traffic, or adversely impact network operations while TE tunnel rerouting is in progress". In non-packet transport networks, the label has a mapping into the data plane resource used and the nodes along the LSP need to send triggering commands to data plane for setting up cross-connections accordingly during the RSVP-TE signaling procedure. Due to the nature of the non-packet transport networks, a node may not be able to fulfill this purpose when sharing resources in some scenarios. I can understand this paragraph, but I think it could benefit from a number of edits. The first is to remove the quotation marks, since the purpose is not to emphasize that RFC 3209 said those words, but rather that 3209 stated the same concept. And I think some of the explanation can be omitted without losing clarity. As described in [RFC3209], Section 2.5, the purpose of make-before- break is not to disrupt traffic, or adversely impact network operations while TE tunnel rerouting is in progress. In non-packet transport networks during the RSVP-TE setup procedure, the nodes along the LSP set up cross-connections accordingly. Because a cross-connection cannot simultaneously connect a shared resource to different resources in two alternative LSPs, nodes may not be able to fulfill this promise when LSPs share resources. -- ---------+--------------------------------------------------------- Category | Node Behavior during Restoration LSP Setup ---------+--------------------------------------------------------- C1 + Reusing existing resource on both input and output + interfaces (nodes A & B in Figure 3). + + This type of node needs to book the existing + resources and no cross-connection setup + command is needed. ---------+--------------------------------------------------------- This would be prettier if most of the +'s were turned into |'s: ---------+--------------------------------------------------------- Category | Node Behavior during Restoration LSP Setup ---------+--------------------------------------------------------- C1 | Reusing existing resource on both input and output | interfaces (nodes A & B in Figure 3). | | This type of node needs to book the existing | resources and no cross-connection setup | command is needed. ---------+--------------------------------------------------------- Note that the items in the second column of the table are composed of two parts: The first part is condition that defines which nodes are in that category, and the second part is the actions that will be taken by such nodes. Ideally, these would be broken out as separate columns. (The current first column provides the labels C1, C2, and C3, but those aren't references anywhere in the document, and could be omitted to save space.) That revises the table to look like this: ------------------------------------+------------------------------ Situation | Actions ------------------------------------+------------------------------ Reusing existing resources | Book the existing resources. on both input and output interfaces | No cross-connection setup is (nodes A & B in Figure 3). | needed. ------------------------------------+------------------------------ Reusing existing resource only on | Book the resources. one of the interfaces (either input | Re-configure the cross-connection or output) and uses new resource on | to connect the re-used resource the other interface. | to the new resource. (nodes C & E in Figure 3). | ------------------------------------+------------------------------ Using new resources on both | Book the new resources. interfaces. | Send the cross-connection setup (nodes F & G in Figure 3). | command on both interfaces. ------------------------------------+------------------------------ Is the meaning of "book" well-known? I find no use of it elsewhere in this document or in any of the references. Depending on whether the resource is re-used or not, the node behaviors differ. Of course, the different behavior is only because we are here optimizing the establishment of the new LSP. A node could send a command to cross-connect two resources that are already connected. This deviates from normal LSP setup since some nodes do not need to re-configure the cross-connection, and it should not be viewed as an error. Why would this (not sending a command to connect things that are already connected) be considered an error under any circumstances? 3.3. LSP Reversion Is "reversion" a standard term? If the end-to-end LSP recovery is revertive, as described in Section 2 ... I'm not sure how the phrase "If the end-to-end LSP recovery is revertive" works. "Recovery" seems to be a general term for techniques to recover from link failures and the like. Is this describing a "revertive" recovery method, or is it describing an instance of recovery which is somehow "revertive"? Compare to "revert", which seems to be the action of putting the traffic back on the original/protection LSP once its functionality is restored. I would expect that behavior to be universal. 1. Make-while-break Reversion, where resources associated with a working or protecting LSP are reconfigured while removing reservations for the restoration LSP. It's not clear to me what sort of reconfiguring is being discussed. Assuming that "reversion" means "when the working/protecting LSP starts working again, traffic is restored to that path", its not clear what sort of reconfiguration would be needed, as the working/protecting LSP already exists. I suspect that this issue shows up when the working/protecting LSP shares resources with the restoration LSP, and moving traffic to the restoration LSP may require reconfiguring resources, and so moving traffic back to working/protecting LSP may require reversing that reconfiguration. But the initial reconfiguration has not been mentioned. Should some sort of general description be put in "Resource Sharing By Restoration LSP" of the possible need to reconfigure when moving traffic to or from a restoration LSP? (This is all rather obvious, but it would help if it was clearly described.) 3.3.1. Make-while-break Reversion Removing reservations for restoration LSP triggers reconfiguration of resources associated with a working or protecting LSP on every node where resources are shared. Could you add an explanation or pointer why this is so? It seems that for this to be true, the reservation process must broadcast an explicit prioritization between the new (restorative) reservation and the old (working) reservation, because the node that is reconfigured has to remember both reservations, and revert to the working one when the restorative one is deleted. It'd be useful for the naive reader to know where in RSVP-TE that information is broadcast and/or how RSVP-TE specified that nodes have to remember that information. Deletion of restoration LSPs is not a revertive process. What is the meaning of "revertive process" here? It doesn't seem to match the sense of "revertive" as used elsewhere. In particular, if RSVP packets are lost due to nodal or DCN failures it is possible for an LSP to be only partially deleted. "nodal" should probably be "node". What is "DCN"? I can't find it in any of the referenced RFCs. Does "link" work as a replacement? 3.3.2. Make-before-break Reversion Instead of relying on deletion of restoration LSP, the head-end chooses to establish a new LSP to reconfigure resources on the working or protection LSP path, and uses identical ASSOCIATION and PROTECTION objects from the LSP it is replacing. This could be made clearer by consistently labeling the enw LSP as the "reversion" LSP. Also, state explicitly that its resources exactly duplicate the resources of the working/protection LSP that is being reverted: Instead of relying on deletion of the restoration LSP, the head-end chooses to establish a new "reversion" LSP that duplicates the configuration of the resources on the working or protection LSP, and uses identical ASSOCIATION and PROTECTION objects for that LSP. -- Reversion LSP is sharing resources both with working and restoration LSPs. Better The reversion LSP shares all of the resources of the working/protection LSP and may share resources with the restoration LSP. -- Hence, after reversion LSP is created, data plane configuration essentially reflects working or protecting LSP reservations. It seems like "essentially" is not needed, because the data plane configuration will *exactly* reflect the working/protecting LSP reservations. Or are there minor variations in how reservations are done that may not be exactly duplicated by the reversion LSP? After "make" part is finished, working and restoration LSPs are torn down. Perhaps emphasize "the original working/protection and restoration LSPs are torn down", as the reversion LSP becomes the new working/protection LSP. o Rollback If "make" part fails, (existing) restoration LSP will still be used to carry existing traffic. Same logic applies here as for any MBB operation failure. The reasoning here is not clear to me. If the "make" operation fails, some of the nodes may be configured for the restoration LSP, while others will be configured for the restoration LSP. Or is it implicit that creating LSPs is an atomic operation network-wide, that incomplete LSP creations will be completely purged from the network? If the latter is true, then the core of this discussion is that creating LSPs is atomic across the network, but *deleting* LSPs is not (and so make-while-break can fail to work). If that difference is true, it should be said explicitly somewhere near the beginning of section 3.3, as that fact is what is driving the whole discussion. [END]