Thanks, Joel, Wrt L2: Always good to explicitly suggest text when that's what you're after. Not sure why i didn't include this sentence. I guess i didn't even had the idea that one cuold enable/dible ACP separately on different L2 ports ;-) Cheers Toerless On Sat, Jul 04, 2020 at 08:59:57PM -0400, Joel M. Halpern wrote: > My apologies for the delay in responding to these comments. > The changes seem to nicely address all of my comments. I hope that I will > recall this well enough to avoid introducing triple-jeopardy by accident. > (Having said that, it appears that my pushing on some of these issues a > second time contributed to your finding good resolutions of the issues.) > > On the 7.2 comments, my primary comment was a mistake on my part. The only > configuration required is the same configuration that is required for ACP > nodes, namely turning on ACP. (Which may or may not be a default setting, > but is clearly a configurable behavior.) > > On the comment about a corner case, I was looking for text saying roughly > "An L2 node that supports ACP and is enabled to participate SHOULD do so on > all its L2 interfaces. I grant this is not a big deal. My concern is if the > L2 link selection is a partial / proper subset of the intended L3 > adjacencies, problems could easily result due to traffic not arriving at all > desired places. > > Thank you, > Joel > > On 6/23/2020 10:35 PM, Toerless Eckert wrote: > > Thanks a lot, Joel > > > > Personal diff with just the fixes for you, otherwise feel free to compare -25 against > > -25, it has more fixes for Russ Housley and IPsec proto detail enhancements/fixes. > > > > http://tools.ietf.org/tools/rfcdiff/rfcdiff.pyht?url1=https://raw.githubusercontent.com/anima-wg/autonomic-control-plane/0caa400fd1c554ece49fddc7dabe8140195aa5bf/draft-ietf-anima-autonomic-control-plane/draft-ietf-anima-autonomic-control-plane.txt&url2=https://raw.githubusercontent.com/anima-wg/autonomic-control-plane/ae9e6cd856ab2706e8b38cc2552f2e77f6b676a5/draft-ietf-anima-autonomic-control-plane/draft-ietf-anima-autonomic-control-plane.txt > > > > Cheers > > toerless > > > > On Thu, Apr 09, 2020 at 07:16:16PM -0700, Joel Halpern via Datatracker wrote: > > > Reviewer: Joel Halpern > > > Review result: Not Ready > > > > > > Hello, > > > > > > I have been selected as the Routing Directorate reviewer for this draft. The > > > Routing Directorate seeks to review all routing or routing-related drafts as > > > they pass through IETF last call and IESG review, and sometimes on special > > > request. The purpose of the review is to provide assistance to the Routing ADs. > > > For more information about the Routing Directorate, please see > > > ???http://trac.tools.ietf.org/area/rtg/trac/wiki/RtgDir > > > > > > Although these comments are primarily for the use of the Routing ADs, it would > > > be helpful if you could consider them along with any other IETF Last Call > > > comments that you receive, and strive to resolve them through discussion or by > > > updating the draft. > > > > > > Document: draft-ietf-anima-autonomic-control-plane-24.txt > > > Reviewer: Joel Halpern > > > Review Date: 9-April-2020 > > > IETF LC End Date: N/A > > > Intended Status: Proposed Standard > > > > > > Summary: > > > I have two major concern about this document that I think should be > > > resolved before publication. The are also a number of minor items that > > > warrant attention. > > > > > > Comments: > > > > > > While quite long, the draft is significantly improved from earlier versions. > > > It does provide significant explanation of its design choices, which is helpful > > > and appreciated. Sometimes this seems to end up more as marketing or promotion > > > instead of explanation, but this is mostly harmless. > > > > Any pointers to specific text that sounds to be too marketing wise > > always welcome. Happy to review that. I thought i had eliminated all > > the ones... i did see myself. > > > > > In particular, I would like to thank the authors and editors for the addition > > > of section 9.3 and its careful discussion of the many issues there. > > > > Thank you. > > > > > Major Issues: > > > > > > Section 6.10.3.1 on the use of Zone-IDs seems, from the material in A.10.1, > > > to be dependent upon either configuration (which ACP is supposed to avoid) > > > or completely unspecified magic. Having an addressing and routing scheme > > > standardized that is impossible to use seems at variance with appropriate > > > practice. It would be fine to say that provision is made for non-zero > > > Zone-IDs in the hope that future work can find ways to scale further using > > > this. But pretending it is well-defined, but not actually defining it, > > > seems unacceptable. > > > > You brought up this issue in your -13 review and we had a longer thread about > > it which ended in i think this statement of yours: > > > > | <8d2d0b06-0982-53a3-0ce0-38a465f58bed@xxxxxxxxxxxxxxx> > > | My perspective is that I would have preferred to see the system designed such > > | that when Zones are needed, they can be added in a way that does not assume > > | system-wide knowledge of the layout choices. > > | > > | I think you could have achieved that. I understand that the working group > > | didn't do that. And beause it is the WG decision, I can live with it. I wish > > | it were better. > > > > No protection against double jeopardy in IETF ? > > Maybe its a good thing except for expediency of completion of the draft, > > because i had to rethink the issue again, and while it took a while i > > hope the result is especially good to help with initial ACP adoption > > challenges: > > > > The included text in the discussion up to -24 is now also in my opinion not very > > useful, it also had technical issues. > > The core issue that was combination of your ask for complete removal > > of the ACP zone address scheme and the (bad) solution of attempting to guess > > at a future way how to provide the final full benefits for the zone address > > scheme in 6.10.3.1. And that guesswork in 6.10.3.1 was not good, which is > > why 6.10.3.1 is now gone in -25. > > > > [Side note: I think i have the zone solution for a future RFC kinda worked out: > > we would have some manual or autonomic zone edges, grasp announcements within each > > zone announcing the Zone-ID and then nodes with ACP-Zone addresses that > > attach into a zone would update the Zone-ID of their ACP address accordingly. > > E-voila: zone-mobility by updating the zone-id field] > > > > However, the main disconnect was that this longer term goal alone is not > > good reason to keep Zone-ID. Instead the main reason to keep it was and is > > the ability to support better partial and incremental adoption of ACP, > > and for that one we do actually have good pre-standard implementation > > experience. > > > > But i didn't want this in the normative part of the spec, and i either > > didn't get the idea that it could go into the operational part (section 9), > > or i felt such text would be too difficult or too much subject to additional > > review attacks. > > > > But given how i think it is really an important option for initial > > deployments of ACP in large networks, i finally wrote that, it is > > now a new section 9.4. Pls check it out. > > > > > Section 6.12.5.1 on loopback interface is factually wrong. > > > It conflates one particular form of loopback interface with > > > the definition of loopback interfaces. > > > > > > This also leads to the error in the definition section (see > > > minor comment below). > > > > Let me move this here so we can have a cohesive discussion about that section. > > > > > 6.12.5.1 refers to the ACP addresses as node addresses. Technically, the > > > IPv6 architecture requires that all addresses are associated with > > > interfaces rather than nodes. I would prefer that this draft not > > > needlessly claim to violate that. > > > > In practice the term node address is often used, maybe less > > in RFC, but more in practice. And its often done > > interchangably with loopback addresses because without changing > > the actual IPv6 functionality, loopback interfaces are the > > main way to achieve the function operators typically associate > > with a node address. > > > > Be that as it may, i have tried to end the rewrite of the section > > with a paragraph that is trying to bring the use of the word "Node" > > in-line with the way RFC8402 does it. > > > > > (Loopback Interfaces were used long before RFC 4291, > > > > Yepp, was just bad english to connect something that was meant to be > > read in he context of IPv4 with an example about IPv6. > > > > > and on routers were often used for external communication. This was itself > > > a repurposing of the original loopback interface, 127.0.0.1, which was > > > indeed for internal use.) > > > > Yepp. > > > > So, i ended up rewriting the whole section also because EricV asked > > in his review earlier this year if it would not be better to use a new > > term instead of loopback. > > > > Whe i reviewed existing normaive references it became clear to me that > > loopback is actually a very good logical name for the function we > > need for addresses we want to behave as what non-dogmatic people > > would call a node address. So i hope the explanation in the new > > text for loopback to well justify the naming choice. > > > > I have also added a bullet list for justifying the loopback address > > use. Really nothing new, but common operational practice, alas, i > > wasn't able to find a list like this in other docs and this > > is an ongoing reason for questions from readers of ACP that do not > > have a background in running IP router networks. > > > > So hopefully, while this point too took me a lot of time to > > rewrite, it is all for the better. > > > > > Minor Issues: > > > > > > It seems distinctly unfortunate that the definition for Data Plane in > > > section 2 explicitly states that this definition is different from that used > > > in other work, including other routing work. This seems a recipe for both > > > confusion and mis-communication among technologists. > > > > Actually, IMHO the term data-plane has always been badly defined in the > > face of the inline-signaling model of IP networks. Are IGP/BGP signaling > > packets data-plane or control-plane ? How about routers connecting via > > L2 unbeknownst to them and their STP packets ? Even if you have an > > opininon, do you have a normative RFC to support your definition ? > > What is the difference between data and forwarding plane ? > > > > Don't answer... rethoric questions... > > > > I have replaced two existing paragraphs in the intro with the following > > text that explains the terminology better and shows how in the vision > > of autonomic networks the term is very logical, and that it is just > > existing non-autonomous networks in which there is more to the data-plane > > than what you might expect, but i think that is perfectly fine, especially > > when considering the layering example from above, where one layers (L2, ethernet) > > control and forwarding plane are just considered to be part of a higher layers > > data-plane. > > > > New text: > > <t>In a fully autonomic network node without legacy control or management functions/protocols, the Data-Plane would be for example just a forwarding plane for "Data" IPv6 packets, aka: packets that are not forwarded by the ACP itself because they are control or management plane packets. In such networks/nodes, there would be no non-autonomous control or non-autonomous management plane. Routing protocols for example would be built inside the ACP as so-called autonomous functions via autonomous service agents, leveraging the ACPs functions instead of implementing them seperately for each protocol: discovery, automaticically established authenticated and encrypted local and distant peer connectivity for control and managemenet traffic and common control/management protocol session and presentation functions.</t> > > > > <t>When the ACP is added to henceforth so-called non-autonomous nodes that have non-autonomous management plane and/or control plane functions, the ACP instead is best abstracted as a special Virtual Routing and Forwarding (VRF) instance (or virtual router) and the complete pre-existing non-autonomous management and/or control plane is considered to be part of the Data-Plane to avoid introduction of more complex, new terminology only for this case. Like the forwarding plane for "Data" packets, the non-autonomous control and management plane functions can then be managed/used via the ACP. This terminology is consistent with pre-existing documents such as <xref target="RFC8368">"/>.</t> > > > > <t>In both instances (autonomous and non-autonomous nodes), the ACP is built such that it is operating in the absene of the Data-Plane, and in the case of existing non-autonomous (management, control) components in the Data-Plane also in th > > e presence of any (mis-)configuration thereof.</t> > > /New text > > > > > In the definition of in-band management in section 2, please remove the > > > commentary text on putative fragility. (I actually agree it has some > > > fragility. The discussion does not belong here. This is a definition.) > > > The promotional material may be warranted, if jarring, in other parts of the > > > documents. Not in the definitions please. > > > > Ok, i stripped down explanatory text for out-of-band network in terminology > > and instead pimped what you would call "marketing" about it in the introduction section. Easy to find in diff. > > > > Always happy to get explicit suggestions for how to reduce what you think > > is "jarring". The ability of ACP to even avoid a single case of sending > > out a tech person to a remote site due to misconfigurations is IMHO > > the bigest single use-case benefit in talks with customers, so i think it deserves > > good factual representation and i can not see where the text goes beyond > > that. I am happy if any positive pitching is called "marketing", but > > i definitely do not want anything to be "jarring". > > > > > The definition of a loopback interface in section 2 is wrong. It claims > > > that loopbacks transmit no external traffic. They send and receive lots > > > of external traffic. They merely do so by forwarding the traffic > > > internally to other interfaces. The traffic is external. The particular > > > step of the transmission, if implemented naively, is internal. > > > > Fixed. > > > > > If we are going to define ACP as a virtual out of band network, I would > > > suggest separating the terms into two definitions. One for true out of > > > band networks (distinct physical links, switches, and ports), and then a > > > definition for virtual out of band network which describes the ACP > > > approximation which creates independence from configuration, but not > > > independence from the physical links. > > > > Done. > > > > [ Note: I am btw. not worried about the link-sharing as a career limiting move > > for ACP, as soon as there is sufficient link redundancy (2 links eliminate 99% of issues). > > > > The actual HW design of the nodes to maximize ACP value is more interesting. > > I had slides about that in a research conference workshop some years back, e.g.: applying > > concepts such as BMC so that you can use the common HW diag functions > > you typically expect from OOB support. ] > > > > > Section 5, bullet 2, talks about a policy as to which peers ACP > > > communication should be established. It would be helpful if this gave a > > > reference or indication as to where such policies would come from. Given > > > the emphasis on zero touch, I presume they are not configured on the node? > > > (This issues was in my review of -13.) > > > > Original -13 thread here: > > > > | >> It is unclear how the flexible policy defined in section 5 bullet 2 (about > > | >> which nodes are ACP peer candidates) is consistent with autonomic > > | >> operation. It seems that the flexibility is important, so there should be > > | >> some explanation here about how this is consonant with the stated goals. I > > | >> understand that the bootstrap comes from BRSKI, but I do not think that is > > | >> where the policy comes from? > > | > > > | > Would rather not like to add more suggestive text, and thats at best what > > | > i could add. The default policy is the best "autonomic" behavior we know how > > | > to make work: aka: try to connect ACP to all neighbors you can discover. And > > | > we have only defined with DULL GRASP how to find subnet adjacent neighbors. > > | > > > | > The main reason to mention policy is so that there is some leeway to do > > | > more or even (sigh) less than all direct neighbors. > > > > Double jeopardy ? > > > > I actually did not bother to fix up the intro section since taking the editor pen > > from Michael. I had kept the "policy" in there as a reminder of Intent to be > > done in the future, but given how we deprioritized > > intent in charter, i felt more happy now than during 13 to fix this. > > > > Alas, it turns out i also found other points in the overview lacking > > clarity and consistency with the normative sections, so the changes > > here got larger, but hopefully all for the better. Please check. > > > > > Bullet 4 of section 6.1.3 on checking certificates against the CRL / OCSP > > > would seem to be better reworded. I believe the intended requirements i > > > that IF there is ACP connectivity to the CRL / OCSP source, then it should > > > be verified. But that absence of such connectivity should not prevent > > > association formation. (As, if I have read it wright, otherwise we could > > > deadlock the startup process.) > > > > Pls. check the full diff vs. -24 for this, because that fix is in the commit i did for > > Russ Housley before i worked on your review. If you don't like that text either, pls > > suggest better wording, its a bit of a tricky language problem i think, which a native > > speaker might master easier. > > > > > In the example in section 6.5 on Channel selection, in steps 7:C1 and > > > 11:C2, Node 1 concludes that it is Bob. However, in steps 12 and 13, the > > > text refers to Node1 (Alice). This seems inconsistent. > > > > Yikes. How could that have slipped me. Thanks a lot. > > > > > > Section 6.7.1 makes an assertion about the lack of need for MTI of security > > > mechanisms. The earlier explanation was well done and seems sound. This > > > shorter one seems wrong, since without MTI there is no good way to know > > > what ones neighbors may implement. I suggest simply removing this text and > > > replacing it with a backwards reference to the earlier description. (The > > > rest of the section is useful and clear.) > > > > Done. > > > > > In 6.10.3, ACP Zone Addressing Sub-Scheme, the text claims that when zone > > > IDs of 0 are used, the addresses are identifiers, and when non-zero IDs > > > aere used, they are locators. Since in either case the addresses are used > > > for packet forwarding, and the addressing information is propagated in the > > > routing protocol (RPL), this seems to be a misuse of the locator / > > > identifier distinction. And a misuse for no purpose as the distinction is > > > not relevant to the document. (This odd use of "identifier continues in > > > section 6.10.3.1. Identifier is not a synonym of "flat". Just say "flat".) > > > > Hey, i didn't come up with all this confusing an probably wrong understanding > > of locator or identifier, i just fell into the trap of trying to use these terms ;-)) > > > > This is removed now. Hope i found all places. Only locartors left should be > > about GRASP. > > > > Is there even any agreed upon distinction ? To me, identifier/locator are just > > two roles an address can have based on who is using it for what purpose. > > They're not exclusive to each other IMHO. > > > > > The assertion about looping packets in the later portion of 6.11.1.1 is > > > over-stated. There are other routing protocols that avoid looping-till-ttl > > > without changing the data plane header. > > > > > I suggest removing the gratuitous comparison with other routing protocols. > > > > Well... it was IMHO not gratuitous, it was just bad text. > > > > The intent was not to make the solution sound better than other routing protocols, > > but rather to explain how it is not far worse than other routing protocols given > > the absence of the RPI (RPL Packet Information). > > > > The text was not good because it only indirectly addressed what > > it intended to describe by just talking about TTL looping. I have replaced this > > paragraph by two paragraphs that hopefully better capture the intent: > > > > [snip] > > <t> > > In RPL profiles where RPL Packet Information (RPI, see <xref target="rpl-Data-Plane"/>) > > is present, it is also used to trigger reconvergence when misrouted, for example looping, packets > > are recognized because of their RPI data. This helps to minimize RPL signaling traffic > > especially in networks without stable topology and slow links. > > </t> > > <t> > > The ACP RPL profile instead relies on quick reconverging the DODAG by > > recognizing link state change (down/up) and triggering reconvergence signaling > > as described in <xref target="rpl-dodag-repair"/>. Since links in the ACP > > are assumed to be mostly reliable (or have link layer protection against loss) > > and because there is no stretch according to <xref target="rpl-dodag-repair"/>, > > loops caused by loss of RPL routing protocol signaling packets should be exceedingly rare.</t> > > </t> > > [/snip] > > > > Hope this is an adequate answer to close this point. > > > > I now have no text about TTL expiry because that is a difficult qualitative > > comparison for which there is IMHO not enough data on evidence: The > > reconvergence with RPL in the ACP profile may be somewhat slower than > > the most common sub-50 msec LFA in SP networks or subsecond SPF-IGP > > fast convergence common in most other networks in scope of ACP ("well manageg, > > aka: private enterprise etc. networks), but the total amount of traffic > > across the ACP will likely be orders of magnitude less than that on the > > Data Plane where the SPF-IGP runs. > > > > I think convergence with the profile should be 50 msec (link change discovery > > plus O(max-pathlength) * per-node RPL processing latency, but i think > > this is too much analysis for a spec, so no text. > > > > > Section 7.2 (L2 DULL GRASP) seems to be doing something quite useful. I > > > think I see how it would work. The need for some configuration on some > > > switches seems inevitable and acceptable. > > > > Hmm.. there is no intent to require configuration. What specifically > > do you think of ? > > > > The goal is really to support ACP in complete L2-only networks, except that > > the ACP itself is of course L3. > > > > One core part of the text is explaining how ACP can be supported > > on the most limited L2 hardware where it can work. Aka: withough changing > > the actual L2 HW forwarding, but just by punting GRASP packets so they > > are not flooded by L2. > > > > > I think there is one corner > > > case that should be avoided, as it seems likely to create significant > > > complexity for little or no benefit. It seems to me that a switch that is > > > capable of participating in the ACP should either participate in the ACP on > > > all its physical ports, or should not participate in the ACP at all. I > > > would not be surprised if that was the WG intent. But I could not find the > > > text that says this. (Apologies if it is there and I missed it.) > > > > Not sure why you specifically think this is an issue for devices > > operating at L2. > > > > I have seen all type of weird problems. For example: How do you > > enable autodiscovery of ACP neighbors across the 10Gbps backbone > > interfaces of a router/switch for broadband if those interfaces > > are initially disabled by software because the user is expected > > to first enter an additional license key to use those interfaces.... > > > > Sorry, randomn example. Maybe rephrase your point with an example > > why you think it deserve additional text ? Suggest additional text ? > > > > > > > > Section 9 starts by saying it is informational. But the first paragraph > > > says that some of the content is "necessary" for correct operation. Thus, > > > it seems that some of the content is normative? (I am not sure, but I > > > think the "necessary" material relates to what is needed to be a registrar?) > > > > The first paragraph does not say "correct operation", and i think > > to remember that i word smithed that paragraph quite > > a bit to walk the thin line: you can not build an ACP without > > understanding this section and follow its advice to > > the extend you deem appropriate or feasible, but we can also not > > normatively standardize what is in this section. > > > > Some things will hopefully gt standardized via future > > yang model RFC. That stuff is just not standardized beause > > it does not meet the formal bar. > > > > Most of the stuff is talking about variety of options > > deemed to be necessary or beneficial in various > > situations. Doing even the Yang stuff for the subset > > people will agree to is a lot of work. > > > > protecting the ACP from operator > > misconfiguration is IMHO necessary. I wouldn't even dare > > to begin guessing what details could get standardized for > > that. Yang models for new interface states would certainly > > be another 5++ year discusion in IETF. better to start > > these things with vendor proprietary Yang models and learn. > > > > I know from personal experience that you can not successfully > > deploy without humunguous amount of diagnostic as long as > > you have buggy implementations, especially when fitting > > into exising router OS, incurring a lot of unforeseen > > limitations. Very difficult to standardize because its > > all about interaction with the non-autonomic stuff unless > > you can severely isolate ACP in your platform design. > > > > If you do not understand the discussions about registrars, > > you will have a hard time getting a working support > > backend system for the registars. > > > > Aka: necessary does not mean standardizable. > > > > > Nits: > > > The second and third paragraphs of section 6.11.1.1 on RPL start with > > > duplicated text, and then go on to say different (complementary) things. > > > There is no need for the repetition. > > > > Right. I reworked the overview to remove duplicates, also structured > > into two subsections to highlight the two key themes of the profile > > (single instance and convergence). > > > > > The rank factor in 6.11.1.6 of 100 megabits as the boundary seems a fairly > > > arbitrary choice. It may be that an arbitrary choice was needed. Could > > > something be said? In particular, if someone looks at this 5 years from > > > now, it may seem quite confusing. > > > > In german, rule of thumb is called "pi times thumb", obviously much more > > accurate than just thumb ;-) > > > > I added the following paragraph: > > > > <t>This is a simple rank differentiation between typical "low speed" > > or "IoT" links that commonly max out at 100 Mbps and typical > > infrastructure links with speeds of 1 Gbps or higher. Given how > > the path selection for the ACP focusses only on reachability but > > not on path cost optimization, no attempts at finer grained path > > optimization are made. </t> > > > > Heard a nice summary about the new ieee work about the future of 10 Mbps > > ethernet over twisted pair, so i think the cut point at 100 Mbps > > may actually be quite a good one. aka: with just two values i don't > > know how we could do better. > > > > Aain, thanks a lot for the review. > > > > Toerless > > -- --- tte@xxxxxxxxx -- last-call mailing list last-call@xxxxxxxx https://www.ietf.org/mailman/listinfo/last-call