Sorry for the late reply.
See response inline #Ahmed. . The response refers to version 15 which I just published to address your comments as well as other reviewers comments
Thanks
Ahmed
Reviewer: Theresa Enghardt Review result: Ready with Issues I am the assigned Gen-ART reviewer for this draft. The General Area Review Team (Gen-ART) reviews all IETF documents being processed by the IESG for the IETF Chair. Please treat these comments just like any other review comments. For more information, please see the FAQ at <https://trac.ietf.org/trac/gen/wiki/GenArtfaq>. Document: draft-ietf-rtgwg-bgp-pic-12 Reviewer: Theresa Enghardt Review Date: 2021-01-10 IETF LC End Date: None IESG Telechat date: Not scheduled for a telechat Summary: The draft is basically ready for publication as an Informational RFC, but it has some context, clarity, and editorial issues that need to be fixed before publication. Major issues: None. Minor issues: Abstract: "In the network comprising thousands of iBGP peers exchanging millions of routes, many routes are reachable via more than one next-hop. Given the large scaling targets, it is desirable to restore traffic after failure in a time period that does not depend on the number of BGP prefixes." This part is missing a logical step in the argumentation between these two sentences. Is the first statement a prerequisite for restoring traffic, and then the question is how to make it scalable? Is the first statement the reason for things not being scalable? Please rephrase to make the relationship between these statements and the overall argumentation clear. Is "depending on the number of BGP prefixes" an inherent feature of BGP, or are you making any implicit assumptions? If so, please state them.
#Ahmed:
First let me answer your first question
Is the first statement a prerequisite for restoring traffic, and then the question is how to make it scalable? Is the first statement the reason for things not being scalable?
The first statement sets the context for the second sentence. The second statement says "Given the large scaling targets". The first statement states where such "large scaling targets" are coming from. They are coming from the "thousands of iBGP peers exchanging millions of routes"
Let me answer the second question:
Is "depending on the number of BGP prefixes" an inherent feature of BGP, or are you making any implicit assumptions? If so, please state them.
There are absolutely no assumptions nor feature description in
the last sentence in the paragraph. The last sentence in the
paragraph simply states a desired objective.
"In this document we proposed an architecture […]" What does architecture mean in this context? Without any further qualification, in a networking context, as a reader I assume that "architecture" means "network architecture", i.e., something that involves multiple nodes such as multiple BGP speakers. But it appears that the document is only about the internals of each individual BGP speaker, i.e., how information is organized within the router. So maybe it's "router architecture" or "software architecture" or such? Please rephrase to make this clear in the abstract. Please clarify your scope. As the abstract specifically mentions iBGP, is this solution only about iBGP? Or is it about eBGP as well?
#Ahmed: the context of the term "architecture" is explained in
the sentence immediately following this sentence when we say
"organizing the forwarding data structure"
#Ahmed: for 'iBGP', I modified it to "BGP"
Introduction: The introduction is missing a clear problem statement. Perhaps it's implicitly stated by saying that "convergence speed is limited by the time taken to serially propagate reachability information from the point of failure to the device that must re-converge.", but please be specific. Is this convergence speed that depends on information propagation time considered "too long", and therefore it needs to be reduced? Is it "too long" specifically in certain contexts, e.g., networks of a certain size? As the document actually appears to focus on speeding up changes within a singe node, it's not clear how this relates to propagation time. Does the node-internal speedup also speed up how fast propagated information converges? Why? As the statement about reachibility information being exchanged is the first sentence of the introduction, this makes it seems like it's fundamental to your document. If this is not the case, please consider starting the introduction with a clear problem statement that is actually fundamental to your document, such as "The way that information is currently organized within a BGP speaker [under … circumstances] is inefficient [for … reason] and leads to long convergence times."
#Ahmed: I removed the first two statements in the introduction
As for problem statement, it is already mentioned in the abstract
when we said "it is desirable to restore traffic after failure in
a time period that does not depend on the number of BGP prefixes."
#Ahmed: This statement and the remaining of the paragraph states facts about BGP that it exchanges prefixes and can select more than one path for each prefix. Exchanging routes and selecting more than one path (whether ECMP or primary/backup) is fundamental to the FIB architecture that we are proposing as it is mentioned in the second paragraph in the introductionIn the next sentence, "BGP speakers exchange reachability information about prefixes […]", the relationship to the problem statement is still not clear. Is this reachability information insufficient? Is there already is enough information to converge faster, and now your solution allows converging faster? Or something else?
"AFI/SAFI" are clearly defined in reference [2], which is referred to in the same sentence where these two acronyms are mentioned"[…] for labeled address families, namely AFI/SAFI 1/4, 2/4, 1/128, and 2/128 […]" - Please expand these acronyms on first use and provide a reference.
#Ahmed: The end of the sentence says "using BGP label unicast technique[3]". [3] clearly explains the term "advertised prefix" in our draft"[…] an edge router assigns local labels to prefixes and associates the local label with each advertised prefix […]" Does this apply to incoming advertisements, outgoing advertisements, or both? Please make the context clear here.
#Ahmed: I used "using technologies such as" as you suggested. The statements provides a list of items, so it is not too long to be split"[…] such as L3VPN [7], 6PE [8], and Softwire [6] using BGP label unicast technique[3]." The "such as" is not entirely clear: If these are examples of the technique that the rest of the sentence describes, perhaps "using technologies such as" would be more clear. However, as the entire sentence is already very long, please consider splitting the sentence and make the relationship between the statements clear.
#Ahmed: I put reference [1] right beside the term NLRI where the term is clearly definedPlease expand NLRI on first use and perhaps provide a definition or reference.
How does the proposal in this document relate to the techniques you mention, i.e., L3VPN, 6PE, and Softwire? Does it require them? Is their usage optional for your solution, but helps (and why)? Please make the relationship of your solution to these techniques explicit and state the prerequirements of your solution, if any.
#Ahmed: It seems like there is a misunderstanding. Label unicast is a method to advertise labels with prefixes. In order to do that, a router associates a local label with every prefix that it advertises as a "labeled unicast" advertisement. What our draft is trying to do is to make the convergence for these advertised prefix independent of the number of these prefixes
In other words, labeled unicast (and the list of examples) is a
given and our draft proposes a technique to make convergence fast,
not that "in order to make convergence fast label unicast must be
used"
"This document proposes a hierarchical and shared forwarding chain organization […]" What is your solution an alternative to? How has information previously been organized? How does the concept of a forwarding chain relate to the details you already gave, which were about a BGP speaker exchanging reachability information and applying path selection - where does the forwarding chain come in? As this appears to be a fundamental concept to your solution, please introduce it in the first paragraph.
#Ahmed:
1. Let me answer the first question "What is your solution an alternative to?"
Listing alternatives is really out of scope of this document
2. Second question: "How has information previously been organized?"
Again other solutions are really outside the scope of this document, specially when such solutions are internal router behavior that vendors may or may not expose
3. 3rd question "How does the concept of a forwarding chain relate to the details you already gave, which were about a BGP speaker exchanging reachability information and applying path selection"
BGP speaker exchanging reachability is necessary for other BGP speaker to figure out the paths to reach a destination. "applying path selection" is necessary for BGP speaker to calculate more than one path to a prefix. Our proposal uses multi-path to make convergence independent of the number of the prefixes as it is mentioned in the second paragraph of the introduction as well as the abstract. So exchanging paths and path selection directly relates to our solution.
"incrementally deployed and enabled with zero operator intervention" Well, deplying and enabling any solution does require operator intervention, e.g., a software update, correct? So perhaps that's Zero other operator intervention? Minimal operator intervention? Or not requiring a specific type of operator intervention that would otherwise be needed? Later in Section 3.1, the draft says "It is noteworthy to mention that the forwarding chain is constructed without any operator intervention at all.", so perhaps it's possible to further qualify what kind of operator intervention would otherwise be necessary, but is not necessary with your solution - e.g., no operator intervention is required to reconfigure routes when a link fails
#Ahmed: The term "enabled with zero operator intervention" refers to the enablement of the "BGP-PIC" technique that we are proposing, not to how software/hardware is provisioned in networks. However to make things clearer I added the following sentence at the end of the second paragraph in the introduction
In other words, once it is implemented and deployed on a router, nothing is required from the operator to make it work.
As for the comment that refers to section 3.1, again other
techniques and alternatives to our proposal is not within the
scope of the proposal itself.
1.1 Terminology Please expand on first usage and consider defining: AFI/SAFI, PE, CE, NLRI, forwarding plane, VPN RD's (probably VPN RDs), LSR, ASBRs, BGP-LU, FIB manager (is this a particular entity? A software component?) You don't have to define all BGP terms that you use, but please expand them once to make it easier to guess what they stand for or to look them up.
#Ahmed I have done the following
AFI/SAFI: They are first mentioned in the introduction and reference [2] is referred to in the same sentence where they are mentioned
NLRI: I added reference [1] right next to the term when it is first mentioned in the Introduction
PE and CE: I attached reference [7] beside the first use of each of them where they are clearly defined
LSR: I added reference [4] right next the first use of each of them
ASBR: This term is first used In the statement right after the statement where inter-AS option C with reference [7] is mentioned. So a reader not familiar with ASBR is obviously not familiar with inter-AS and should refer to reference [7]. However I added reference [7] right next to where ASBR is first mentioned
BGP-LU: I added this acronym to the introduction right after the place where the term "BGP labeled unicast" is first mentioned
FIB: I added "(Forwarding Information Base)" next to the first
use of FIB.
FIB manager: I added "( software or hardware entity responsible for managing the FIB) " right next where the term is first mentioned
For "Leaf", "IP leaf", "Label leaf": Why is it called leaf? In graph theory, isn't the leaf of a tree the node with no children and only one parent? In your figures, the "IP leaf" appears to have no parent and instead two children. So isn't it more of a root in the tree? Later, you mention the pathlist being "the parent" of the IP leaf, but in Figure 2, you have an arrow from the IP leaf pointing to the Pathlist, so to me that looks like the Pathlist is the child of the IP leaf. Is this a BGP convention? If so, perhaps a sentence stating that would help, and/or a reference.
#Ahmed: We have defined the term "leaf" in section 1.1. I do not understand what is the cause of confusion
#Ahmed: As for the arrows in the diagram, we defined the term
"dependency" in section 1.1. The arrows show the direction of of a
child to a parent.
"OutLabel-List: Each labeled prefix is associated with an OutLabel-List. The OutLabel-List is an array of one or more outgoing labels and/or label actions where each label or label action has 1-to-1 correspondence to a path in the pathlist. Label actions are: push the label, pop the label, swap the incoming label with the label in the Outlabel-Array entry, or don't push anything at all in case of "unlabeled". The prefix may be an IGP or BGP prefix" What are labels/label actions in this context? Are labels the same labels mentioned in the introduction, i.e., local labels that are assigned to prefixes? Are "outgoing labels" still local? Maybe here a brief explanation of how labels are defined and how they work would help.
#Ahmed: I do not understand what is not clear in this definition.
It says
"The OutLabel-List is an array of one or more outgoing labels and/or label actions where each label or label action has 1-to-1 correspondence to a path in the pathlist."
A "label action" are also clearly defined in the same paragraph
Explaining how labels are defined and how they work is very big topic with a huge amount of references, standards, white papers, research papers,..,etc are published and is certainly out of the scope of this document
#Ahmed: Thanks for catching the missing word. I added the word "chain" as you pointed out.2. Overview: "A forwarding plane that supports multiple levels of indirection: A forwarding that starts with a destination and ends with an outgoing interface is not a simple flat structure." What is "A forwarding"? Do you mean a forwarding entry? Is this the same thing as a route? Please consider adding a definition to the terminology. Is a forwarding plane the same as a forwarding chain (mentioned in the abstract)? If so, please unify your terminology. If not, please define the terms and explain what the differences are.
#Ahmed: again thanks for catching the missing word. I added "clearly required" as you pointed out2.1.2. Availability of more than one BGP next-hops "The existence of a secondary next-hop is clear for the following reason: a service caring for network availability will require two disjoint network connections hence two BGP next-hops." By "the existence is clear" you mean "The existence is clearly required" or "It is clear whether a secondary next-hop exists" or something else?
2.2 BGP-PIC Illustration "We can see that the BGP pathlist consisting of BGP-NH1 and BGP-NH2 is shared by all NLRIs reachable via ePE1 and ePE2." How can we see that? ePE1 and ePE2 do not show up in Figure 2. I assume they map to something that is shown, but it's not clear what.
#Ahmed:
It is the figure AND the list of prefixes before the figure that clearly show that. The list of prefixes with the outgoing paths shows the following
65000: 198.51.100.0/24
via ePE1 (192.0.2.1), VPN Label: VPN-L11
via ePE2 (192.0.2.2), VPN Label: VPN-L21
65000: 203.0.113.0/24
via ePE1 (192.0.2.1), VPN Label: VPN-L12
via ePE2 (192.0.2.2), VPN Label: VPN-L22
Both prefixes have
via ePE1 (192.0.2.1), VPN Label: VPN-L11
via ePE2 (192.0.2.2), VPN Label: VPN-L21
Hence the phrase that "shared by by all prefixes"
3.2. Example: Primary-Backup Path Scenario Comparing Figure 3 to Figure 2, there's a couple of differences in terminology: Figure 2 has an "IP Leaf" and Figure 3 has an "IP prefix leaf" called VPN-IP1. Are "IP Leaf" and "IP prefix leaf" the same concept? If so, please unify your terminology. Same question for VPN-L11 being "OutLabel-List" (Figure 2) and "Label-leaf" (Figure 3), VPN-L21 being part of an "OutLabel-List" (Figure 2) and "BGP OutLabel Array" (Figure 3), and BGP-NH1 being part of a "Pathlist" (Figure 2) and "BGP Pathlist". Figure 3 does not appear to show any Adjacency - why? Figure 2 does not appear to show any label actions - Why? Furthermore, making the figures more similar stylistically (e.g., having "IP prefix leaf" being always underlined or always in brackets) would help for comparing the two figures.
#Ahmed:
For the use of "Outlabel-list" vs "outlabel-array", thanks for
catching the few uses of the term "array". The intention was to
show that most likely the "Outlabel-list" will be implemented as
an "array'. But I agree that terminology consistency is important.
So I replaces the few places were "array" is used to "list"
For the confusion of using "IP prefix leaf" and "IP leaf", I changes the only two usages of "IP prefix leaf" to IP leaf" to avoid the confusion
#Ahmed: the terms "push", "pop",.., are well known terms in MPLS. I added a reference to RFC3031 "MPLS Architecture" to the place where label actions is first mentioned in Section 1.14. Forwarding Behavior "apply the label action of the label on the packet" What does this mean? Does "push" mean that the forwarding engine will add the label to the packet? How will this label be used? Will it be removed from the packet later? Will it be sent in a BGP advertisement? Please make this clearer here, and/or please explain what labels and label actions are earlier, and how they are used.
"the forwarding engine applies a hashing algorithm to choose the path and the hashing at the BGP level yields path 0 while the hashing at the IGP level yields path 1" This sounds like ECMP, i.e., there's multiple paths and each packet is hashed and then sent through a path based on the hash. But the earlier sections sounded like your solution was more about primary paths and secondary failover paths. Are these two general approaches and your solution works for either? Please make this explicit, possibly early in the document.
#Ahmed The begining of the paragraph says
Let’s apply the above forwarding steps to the forwarding chain depicted in Figure 2 in Section 2.
Figure 2 in section 2 has ECMP. Besides the abstract (as well as
other places in the document) clearly says
traffic can be re-routed to ECMP or pre-calculated backup paths
so the solution applies to both ECMP or primary/backup
5.1. Flattening the Forwarding Chain "Suppose the platform cannot support the number of hierarchy levels in the forwarding chain. FIB needs to reduce the number of hierarchy levels. […]" When in the process does this flattening happen? Only when a packet is forwarded, like in the above steps, or does it happen when the chain is first constructed? Does the flattening happen after a specific step in the above process, e.g., step 3, or is it independent? If it happens for each forwarded packet, this seems like a lot of steps. How is the overall efficiency still maintained?
#Ahmed: I changes the second sentence in the paragraph to
FIB manager needs to reduce the number of hierarchy levels when programming the forwarding chain in the FIB.
to illustrates that the process of flattening occurs at the time of programming the FIB, not at the time of forwarding the packet
I also described what FIB manager means when it is first mention
in Section 3.1
6.1. BGP-PIC core "When a remote link or node fails, IGP on the ingress PE receives advertisement indicating a topology change so IGP re-converges to either find a new next-hop and/or outgoing interface or remove the path completely from the IGP prefix used to resolve BGP next-hops." Why IGP, when this document is about BGP? Is implied by the scope "when a core link or node fails but the BGP next-hop remains reachable"? If so, please make this explicit.
#Ahmed: IGP was mentioned multiple times before this section.
Other protocols are also mentioned such as LDP and SR. Also even
the name of the document is BGP-PIC, that does not mean we will
only talk about BGP. In fact a good portion of the document does
NOT talk about BGP. Instead it talks about FIB, forwarding chains,
labels,.., etc. So mentioning IGP in the document is not something
strange or out of context
As for your question
Is implied by the scope "when a core link or node fails but the BGP next-hop remains reachable"?
The first line in section 6.1 says
This section describes the adjustments to the forwarding chain when a core link or node fails but the BGP next-hop remains reachable.
#Ahmed: Again describing other methods is beyond the scope of this document"As soon as the IGP convergence is complete for the BGP next-hop IGP route, all its BGP depending routes benefit from the new path." What would happen in a scenario where BGP-PIC is not used? Would it take longer until the BGP routes can use the new path, and why?
6.2.2 "the edge node attached to the failed link performs next-hop self" - What does "perform next-hop self" mean? Is there a word missing here, e.g., "lookup"?
#Ahmed I added
(where BGP advertises the IP address of its own loopback as next-hop)
to explain what next-hop self means
#Ahmed: The term convergence means the router is able to forward traffic to the destination as long as the destination is reachable"The main observation is that the loss of convergence speed due to the loss of hierarchy depth" Does convergence depend of the exchange of BGP messages between BGP peers, or is the concept of convergence defined differently here? It seems like here convergence means something related to how information is stored/updated locally on the router, which is not what I would think about when I read "BGP convergence". (Related to the comment at the beginning of the introduction: What is your problem statement, i.e., what is the type of convergence you are talking about and that your solution speeds up?))
#Ahmed: Bugs are always a security consideration in all computer based systems. AFAIK most of the IETF drafts and RFCs do not put bugs as a security consideration. Most notably (in the context of our draft) RFC4271 (BGP standard) and rfc4272 (BGP security vulnerabilities analysis) do not talk about bugs8. Security Considerations Are you sure that there are no security considerations? For example, if there is a bug in the implementation of this technique, could this make BGP prefix hijacking easier given a specific use of BGP labels?
#Ahmed: CorrectedNits/editorial comments: Abstract: "In the network comprising thousands of iBGP peers" -> "In a network comprising thousands of iBGP peers"
#Ahmed: donePlease expand BGP-PIC on first use.
#Ahmed: Corrected1.1 Terminology "A prefix P/m (of any AFI/SAFI) that is learnt via an Interior Gateway Protocol, such as OSPF and ISIS, has a path for." - Is this sentence missing a subject for the "has a path for"? If this is "A prefix that an IGP has a path for", then the "is learnt via" does not fit in the sentence.
#Ahmed: Corrected"one or more prefix" -> "one or more prefixes"
#Ahmed: Corrected"a IP prefix" -> "an IP prefix"
#Ahmed: CorrectedThere's a stray ") in the "Pathlist" item.
#Ahmed: Corrected"may not necessarily has" -> "may not necessarily have"
#Ahmed: Corrected"the forwarding engine must visits" -> "the forwarding engine must visit"
#Ahmed: added '.' to all of themPlease make all your terminology items consistent, i.e., sentences ending with a full stop or not.
#Ahmed: Corrected"A pathlist may contain a mix of primary and backup paths" - why is this its own item? Isn't it about the previous item, "Pathlist", and should be part of the same bullet point item?
#Ahmed: Corrected2.2.1 Hierarchical Hardware FIB "the number of memory lookup's" -> "the number of memory lookups"
#Ahmed: Unified5.1. Flattening the Forwarding Chain Please unify how you write your terms, e.g., "OutLabel-list" vs. "outlabel-list" (Section 5.1) Please unify whether you capitalize all words in your headings or just some.
-- last-call mailing list last-call@xxxxxxxx https://www.ietf.org/mailman/listinfo/last-call