Re: [Last-Call] Genart last call review of draft-ietf-rtgwg-bgp-pic-12

Ahmed Bashandy <abashandy.ietf@xxxxxxxxx> · Fri, 20 Aug 2021 17:33:16 -0700

    Sorry for the late reply.
    See response inline #Ahmed. . The response refers to version 15
      which I just published to address your comments as well as other
      reviewers comments
    Thanks
    Ahmed

    On 1/10/21 1:48 PM, Theresa Enghardt
      via Datatracker wrote:

      Reviewer: Theresa Enghardt
Review result: Ready with Issues

I am the assigned Gen-ART reviewer for this draft. The General Area
Review Team (Gen-ART) reviews all IETF documents being processed
by the IESG for the IETF Chair.  Please treat these comments just
like any other review comments.

For more information, please see the FAQ at

<https://trac.ietf.org/trac/gen/wiki/GenArtfaq>.

Document: draft-ietf-rtgwg-bgp-pic-12
Reviewer: Theresa Enghardt
Review Date: 2021-01-10
IETF LC End Date: None
IESG Telechat date: Not scheduled for a telechat

Summary: The draft is basically ready for publication as an Informational RFC,
but it has some context, clarity, and editorial issues that need to be fixed
before publication.

Major issues: None.

Minor issues:

Abstract:

"In the network comprising thousands of iBGP peers exchanging millions
of routes, many routes are reachable via more than one next-hop.
Given the large scaling targets, it is desirable to restore traffic
after failure in a time period that does not depend on the number of
BGP prefixes."
This part is missing a logical step in the argumentation between these two
sentences. Is the first statement a prerequisite for restoring traffic, and
then the question is how to make it scalable? Is the first statement the reason
for things not being scalable? Please rephrase to make the relationship between
these statements and the overall argumentation clear. Is "depending on the
number of BGP prefixes" an inherent feature of BGP, or are you making any
implicit assumptions? If so, please state them.

    #Ahmed: 

    First let me answer your first question 

    Is the first statement a prerequisite for restoring traffic, and
then the question is how to make it scalable? Is the first statement the reason
for things not being scalable? 
    The first statement sets the context for the second sentence. The
      second statement says "Given the large scaling targets". The first
      statement states where such "large scaling targets" are coming
      from. They are coming from the "thousands of iBGP peers exchanging
      millions of routes"
    Let me answer the second question: 

    Is "depending on the
number of BGP prefixes" an inherent feature of BGP, or are you making any
implicit assumptions? If so, please state them.
    There are absolutely no assumptions nor feature description in
      the last sentence in the paragraph. The last sentence in the
      paragraph simply states a desired objective. 

"In this document we proposed an architecture […]"
What does architecture mean in this context? Without any further qualification,
in a networking context, as a reader I assume that "architecture" means
"network architecture", i.e., something that involves multiple nodes such as
multiple BGP speakers. But it appears that the document is only about the
internals of each individual BGP speaker, i.e., how information is organized
within the router. So maybe it's "router architecture" or "software
architecture" or such? Please rephrase to make this clear in the abstract.

Please clarify your scope. As the abstract specifically mentions iBGP, is this
solution only about iBGP? Or is it about eBGP as well?

    #Ahmed: the context of the term "architecture" is explained in
      the sentence immediately following this sentence when we say
      "organizing the forwarding data structure"

    #Ahmed: for 'iBGP', I modified it to "BGP" 

Introduction:

The introduction is missing a clear problem statement. Perhaps it's implicitly
stated by saying that "convergence speed is limited by the time taken to
serially propagate reachability information from the point of failure to the
device that must re-converge.", but please be specific. Is this convergence
speed that depends on information propagation time considered "too long", and
therefore it needs to be reduced? Is it "too long" specifically in certain
contexts, e.g., networks of a certain size? As the document actually appears to
focus on speeding up changes within a singe node, it's not clear how this
relates to propagation time. Does the node-internal speedup also speed up how
fast propagated information converges? Why? As the statement about reachibility
information being exchanged is the first sentence of the introduction, this
makes it seems like it's fundamental to your document. If this is not the case,
please consider starting the introduction with a clear problem statement that
is actually fundamental to your document, such as "The way that information is
currently organized within a BGP speaker [under … circumstances] is inefficient
[for … reason] and leads to long convergence times."

    #Ahmed: I removed the first two statements in the introduction
    As for problem statement, it is already mentioned in the abstract
      when we said "it is desirable to restore traffic after failure in
      a time period that does not depend on the number of BGP prefixes."

In the next sentence, "BGP speakers exchange reachability information about
prefixes […]", the relationship to the problem statement is still not clear. Is
this reachability information insufficient? Is there already is enough
information to converge faster, and now your solution allows converging faster?
Or something else?

    #Ahmed: This statement and the remaining of the paragraph states
    facts about BGP that it exchanges prefixes and can select more than
    one path for each prefix. Exchanging routes and selecting more than
    one path (whether ECMP or primary/backup) is fundamental to the FIB
    architecture that we are proposing as it is mentioned in the second
    paragraph in the introduction

"[…] for labeled address families, namely AFI/SAFI 1/4, 2/4, 1/128, and 2/128
[…]" - Please expand these acronyms on first use and provide a reference.

    "AFI/SAFI" are clearly defined in reference [2], which is referred
    to in the same sentence where these two acronyms are mentioned

"[…] an edge router assigns local labels to prefixes and associates the
local label with each advertised prefix […]"
Does this apply to incoming advertisements, outgoing advertisements, or both?
Please make the context clear here.

    #Ahmed: The end of the sentence says "using BGP label unicast
    technique[3]". [3] clearly explains the term "advertised prefix" in
    our draft

"[…] such as L3VPN [7], 6PE
[8], and Softwire [6] using BGP label unicast technique[3]."
The "such as" is not entirely clear: If these are examples of the technique
that the rest of the sentence describes, perhaps "using technologies such as"
would be more clear. However, as the entire sentence is already very long,
please consider splitting the sentence and make the relationship between the
statements clear.

    #Ahmed: I used "using technologies such as" as you suggested. The
    statements provides a list of items, so it is not too long to be
    split

Please expand NLRI on first use and perhaps provide a definition or reference.

    #Ahmed: I put reference [1] right beside the term NLRI where the
    term is clearly defined

How does the proposal in this document relate to the techniques you mention,
i.e., L3VPN, 6PE, and Softwire? Does it require them? Is their usage optional
for your solution, but helps (and why)? Please make the relationship of your
solution to these techniques explicit and state the prerequirements of your
solution, if any.

    #Ahmed: It seems like there is a misunderstanding. Label unicast
      is a method to advertise labels with prefixes. In order to do
      that, a router associates a local label with every prefix that it
      advertises as a "labeled unicast" advertisement. What our draft is
      trying to do is to make the convergence for these advertised
      prefix independent of the number of these prefixes
    In other words, labeled unicast (and the list of examples) is a
      given and our draft proposes a technique to make convergence fast,
      not that "in order to make convergence fast label unicast must be
      used"

      "This document proposes a hierarchical and shared forwarding chain
organization […]"
What is your solution an alternative to? How has information previously been
organized? How does the concept of a forwarding chain relate to the details you
already gave, which were about a BGP speaker exchanging reachability
information and applying path selection - where does the forwarding chain come
in? As this appears to be a fundamental concept to your solution, please
introduce it in the first paragraph.

    #Ahmed: 

    1. Let me answer the first question "What is your solution an
      alternative to?"
    Listing alternatives is really out of scope of this document
    2. Second question: "How has information previously been
      organized?"
    Again other solutions are really outside the scope of this
      document, specially when such solutions are internal router
      behavior that vendors may or may not expose
    3. 3rd question "How does the concept of a forwarding chain
      relate to the details you already gave, which were about a BGP
      speaker exchanging reachability information and applying path
      selection"
    BGP speaker exchanging reachability is necessary for other BGP
      speaker to figure out the paths to reach a destination. "applying
      path selection" is necessary for BGP speaker to calculate more
      than one path to a prefix. Our proposal uses multi-path to make
      convergence independent of the number of the prefixes as it is
      mentioned in the second paragraph of the introduction as well as
      the abstract. So exchanging paths and path selection directly
      relates to our solution.

"incrementally deployed and enabled with zero operator intervention"
Well, deplying and enabling any solution does require operator intervention,
e.g., a software update, correct? So perhaps that's Zero other operator
intervention? Minimal operator intervention? Or not requiring a specific type
of operator intervention that would otherwise be needed? Later in Section 3.1,
the draft says "It is noteworthy to mention that the forwarding chain is
constructed without any operator intervention at all.", so perhaps it's
possible to further qualify what kind of operator intervention would otherwise
be necessary, but is not necessary with your solution - e.g., no operator
intervention is required to reconfigure routes when a link fails

    #Ahmed: The term "enabled with zero operator intervention" refers
      to the enablement of the "BGP-PIC" technique that we are
      proposing, not to how software/hardware is provisioned in
      networks. However to make things clearer I added the following
      sentence at the end of the second paragraph in the introduction

    In other words,
      once it is
      implemented and deployed on a router, nothing is required from the
      operator to
      make it work.
     As for the comment that refers to section 3.1, again other
      techniques and alternatives to our proposal is not within the
      scope of the proposal itself.

1.1 Terminology

Please expand on first usage and consider defining: AFI/SAFI, PE, CE, NLRI,
forwarding plane, VPN RD's (probably VPN RDs), LSR, ASBRs, BGP-LU, FIB manager
(is this a particular entity? A software component?) You don't have to define
all BGP terms that you use, but please expand them once to make it easier to
guess what they stand for or to look them up.

    #Ahmed I have done the following
    AFI/SAFI: They are first mentioned in the introduction and
      reference [2] is referred to in the same sentence where they are
      mentioned
    NLRI: I added reference [1] right next to the term when it is
      first mentioned in the Introduction
    PE and CE: I attached reference [7] beside the first use of each
      of them where they are clearly defined
    LSR: I added reference [4] right next the first use of each of
      them
    ASBR: This term is first used In the statement right after the
      statement where  inter-AS option C with reference [7] is
      mentioned. So a reader not familiar with ASBR is obviously not
      familiar with inter-AS and should refer to reference [7]. However
      I added reference [7] right next to where ASBR is first mentioned
    BGP-LU: I added this acronym to the introduction right after the
      place where the term "BGP labeled unicast" is first mentioned
    FIB: I added "(Forwarding Information Base)" next to the first
      use of FIB. 

    FIB manager: I added "(
      software or hardware entity
        responsible for
        managing the FIB)
      " right next where the term is first
      mentioned 

For "Leaf", "IP leaf", "Label leaf": Why is it called leaf? In graph theory,
isn't the leaf of a tree the node with no children and only one parent? In your
figures, the "IP leaf" appears to have no parent and instead two children. So
isn't it more of a root in the tree? Later, you mention the pathlist being "the
parent" of the IP leaf, but in Figure 2, you have an arrow from the IP leaf
pointing to the Pathlist, so to me that looks like the Pathlist is the child of
the IP leaf. Is this a BGP convention? If so, perhaps a sentence stating that
would help, and/or a reference.

    #Ahmed: We have defined the term "leaf" in section 1.1. I do not
      understand what is the cause of confusion
    #Ahmed: As for the arrows in the diagram, we defined the term
      "dependency" in section 1.1. The arrows show the direction of of a
      child to a parent. 

"OutLabel-List: Each labeled prefix is associated with an
          OutLabel-List. The OutLabel-List is an array of one or more
          outgoing labels and/or label actions where each label or label
          action has 1-to-1 correspondence to a path in the pathlist.
          Label actions are: push the label, pop the label, swap the
          incoming label with the label in the Outlabel-Array entry, or
          don't push anything at all in case of "unlabeled". The prefix
          may be an IGP or BGP prefix"
What are labels/label actions in this context? Are labels the same labels
mentioned in the introduction, i.e., local labels that are assigned to
prefixes? Are "outgoing labels" still local? Maybe here a brief explanation of
how labels are defined and how they work would help.

    #Ahmed: I do not understand what is not clear in this definition.
      It says 

    "The OutLabel-List is an array of one or more outgoing labels
      and/or label actions where each label or label action has 1-to-1
      correspondence to a path in the pathlist."
    A "label action" are also clearly defined in the same paragraph
    Explaining how labels are defined and how they work is very big
      topic with a huge amount of references, standards, white papers,
      research papers,..,etc are published and is certainly out of the
      scope of this document

      2. Overview:

"A forwarding plane that supports multiple levels of indirection:
A forwarding that starts with a destination and ends with an
outgoing interface is not a simple flat structure."
What is "A forwarding"? Do you mean a forwarding entry? Is this the same thing
as a route? Please consider adding a definition to the terminology. Is a
forwarding plane the same as a forwarding chain (mentioned in the abstract)? If
so, please unify your terminology. If not, please define the terms and explain
what the differences are.

    #Ahmed: Thanks for catching the missing word. I added the word
    "chain" as you pointed out.

      2.1.2. Availability of more than one BGP next-hops

"The existence of a secondary next-hop is clear for the following
reason: a service caring for network availability will require two
disjoint network connections hence two BGP next-hops."

By "the existence is clear" you mean "The existence is clearly required" or "It
is clear whether a secondary next-hop exists" or something else?

    #Ahmed: again thanks for catching the missing word. I added "clearly
    required" as you pointed out

2.2 BGP-PIC Illustration

"We can see that the BGP
pathlist consisting of BGP-NH1 and BGP-NH2 is shared by all NLRIs
reachable via ePE1 and ePE2."
How can we see that? ePE1 and ePE2 do not show up in Figure 2. I assume they
map to something that is shown, but it's not clear what.

    #Ahmed:
    It is the figure AND the list of prefixes before the figure that
      clearly show that. The list of prefixes with the outgoing paths
      shows the following

          65000:
      198.51.100.0/24
             via
      ePE1
      (192.0.2.1), VPN Label: VPN-L11
             via
      ePE2
      (192.0.2.2), VPN Label: VPN-L21

          65000:
      203.0.113.0/24
             via
      ePE1
      (192.0.2.1), VPN Label: VPN-L12
             via
      ePE2
      (192.0.2.2), VPN Label: VPN-L22

    Both prefixes have 

             via
      ePE1
      (192.0.2.1), VPN Label: VPN-L11
             via
      ePE2
      (192.0.2.2), VPN Label: VPN-L21

    Hence the phrase that "shared by by all prefixes" 

3.2. Example: Primary-Backup Path Scenario

Comparing Figure 3 to Figure 2, there's a couple of differences in terminology:
Figure 2 has an "IP Leaf" and Figure 3 has an "IP prefix leaf" called VPN-IP1.
Are "IP Leaf" and "IP prefix leaf" the same concept? If so, please unify your
terminology. Same question for VPN-L11 being "OutLabel-List" (Figure 2) and
"Label-leaf" (Figure 3), VPN-L21 being part of an "OutLabel-List" (Figure 2)
and "BGP OutLabel Array" (Figure 3), and BGP-NH1 being part of a "Pathlist"
(Figure 2) and "BGP Pathlist". Figure 3 does not appear to show any Adjacency -
why? Figure 2 does not appear to show any label actions - Why? Furthermore,
making the figures more similar stylistically (e.g., having "IP prefix leaf"
being always underlined or always in brackets) would help for comparing the two
figures.

    #Ahmed:
    For the use of "Outlabel-list" vs "outlabel-array", thanks for
      catching the few uses of the term "array". The intention was to
      show that most likely the "Outlabel-list" will be implemented as
      an "array'. But I agree that terminology consistency is important.
      So I replaces the few places were "array" is used to "list" 

    For the confusion of using "IP prefix leaf" and "IP leaf", I
      changes the only two usages of "IP prefix leaf" to IP leaf" to
      avoid the confusion

4. Forwarding Behavior

"apply the label action of the label on the packet"
What does this mean? Does "push" mean that the forwarding engine will add the
label to the packet? How will this label be used? Will it be removed from the
packet later? Will it be sent in a BGP advertisement? Please make this clearer
here, and/or please explain what labels and label actions are earlier, and how
they are used.

    #Ahmed: the terms "push", "pop",.., are well known terms in MPLS. I
    added a reference to RFC3031 "MPLS Architecture" to the place where
    label actions is first mentioned in Section 1.1

"the forwarding engine applies a hashing algorithm to choose the path and
the hashing at the BGP level yields path 0 while the hashing at the
IGP level yields path 1"
This sounds like ECMP, i.e., there's multiple paths and each packet is hashed
and then sent through a path based on the hash. But the earlier sections
sounded like your solution was more about primary paths and secondary failover
paths. Are these two general approaches and your solution works for either?
Please make this explicit, possibly early in the document.

    #Ahmed The begining of the paragraph says

      Let’s apply the above forwarding
        steps to the forwarding
        chain depicted in ‎Figure 2
        in
        Section ‎2. 

    Figure 2 in section 2 has ECMP. Besides the abstract (as well as
      other places in the document) clearly says 

      traffic can be re-routed to ECMP
        or pre-calculated
        backup paths

    so the solution applies to both ECMP or primary/backup

5.1. Flattening the Forwarding Chain

"Suppose the platform cannot support the number of hierarchy levels
in the forwarding chain. FIB needs to reduce the number of hierarchy
levels. […]"
When in the process does this flattening happen? Only when a packet is
forwarded, like in the above steps, or does it happen when the chain is first
constructed? Does the flattening happen after a specific step in the above
process, e.g., step 3, or is it independent? If it happens for each forwarded
packet, this seems like a lot of steps. How is the overall efficiency still
maintained?

    #Ahmed:  I changes the second sentence in the paragraph to 

      FIB manager needs to reduce the
        number of
        hierarchy levels when programming the forwarding chain in the
        FIB. 

    to illustrates that the process of flattening occurs at the time
      of programming the FIB, not at the time of forwarding the packet
    I also described what FIB manager means when it is first mention
      in Section 3.1

6.1. BGP-PIC core

"When a remote link or node fails, IGP on the ingress PE receives
advertisement indicating a topology change so IGP re-converges to
either find a new next-hop and/or outgoing interface or remove the
path completely from the IGP prefix used to resolve BGP next-hops."
Why IGP, when this document is about BGP?
Is implied by the scope "when a core link or node fails but the BGP next-hop
remains reachable"? If so, please make this explicit.

    #Ahmed: IGP was mentioned multiple times before this section.
      Other protocols are also mentioned such as LDP and SR. Also even
      the name of the document is BGP-PIC, that does not mean we will
      only talk about BGP. In fact a good portion of the document does
      NOT talk about BGP. Instead it talks about FIB, forwarding chains,
      labels,.., etc. So mentioning IGP in the document is not something
      strange or out of context

    As for your question 

    Is implied by the scope "when a core link or node fails but the BGP next-hop
remains reachable"?
    The first line in section 6.1 says

    This section describes the adjustments to the
      forwarding
      chain when a core link or node fails but the BGP next-hop remains
      reachable.

"As soon as the IGP convergence is
complete for the BGP next-hop IGP route, all its BGP depending
routes benefit from the new path."
What would happen in a scenario where BGP-PIC is not used? Would it take longer
until the BGP routes can use the new path, and why?

    #Ahmed: Again describing other methods is beyond the scope of this
    document

6.2.2
"the edge node attached to the failed
link performs next-hop self" - What does "perform next-hop self" mean? Is there
a word missing here, e.g., "lookup"?

    #Ahmed I added

      (where BGP advertises the IP
        address of its own
        loopback as next-hop)

    to explain what next-hop self means

      "The main observation is that the loss of convergence speed due to
the loss of hierarchy depth"
Does convergence depend of the exchange of BGP messages between BGP peers, or
is the concept of convergence defined differently here? It seems like here
convergence means something related to how information is stored/updated
locally on the router, which is not what I would think about when I read "BGP
convergence". (Related to the comment at the beginning of the introduction:
What is your problem statement, i.e., what is the type of convergence you are
talking about and that your solution speeds up?))

    #Ahmed: The term convergence means the router is able to forward
    traffic to the destination as long as the destination is reachable

8. Security Considerations

Are you sure that there are no security considerations?
For example, if there is a bug in the implementation of this technique, could
this make BGP prefix hijacking easier given a specific use of BGP labels?

    #Ahmed: Bugs are always a security consideration in all computer
    based systems.  AFAIK most of the IETF drafts and RFCs do not put
    bugs as a security consideration. Most notably (in the context of
    our draft) RFC4271 (BGP  standard) and rfc4272 (BGP security
    vulnerabilities analysis) do not talk about bugs

Nits/editorial comments:

Abstract:

"In the network comprising thousands of iBGP peers" -> "In a network comprising
thousands of iBGP peers"

    #Ahmed: Corrected

Please expand BGP-PIC on first use.

    #Ahmed: done

1.1 Terminology

"A prefix P/m (of any AFI/SAFI) that is learnt via
an Interior Gateway Protocol, such as OSPF and ISIS, has a path
for." - Is this sentence missing a subject for the "has a path for"? If this is
"A prefix that an IGP has a path for", then the "is learnt via" does not fit in
the sentence.

    #Ahmed: Corrected

      "one or more prefix" -> "one or more prefixes"

    #Ahmed: Corrected

      "a IP prefix" -> "an IP prefix"

    #Ahmed: Corrected

      There's a stray ") in the "Pathlist" item.

    #Ahmed: Corrected

      "may not necessarily has" -> "may not necessarily have"

    #Ahmed: Corrected

      "the forwarding engine must visits" -> "the forwarding engine must visit"

    #Ahmed: Corrected

      Please make all your terminology items consistent, i.e., sentences ending with
a full stop or not.

    #Ahmed: added '.' to all of them 

"A pathlist may contain a mix of primary and backup paths" - why is this its
own item? Isn't it about the previous item, "Pathlist", and should be part of
the same bullet point item?

    #Ahmed: Corrected

2.2.1 Hierarchical Hardware FIB

"the number of memory lookup's" -> "the number of memory lookups"

    #Ahmed: Corrected

5.1. Flattening the Forwarding Chain

Please unify how you write your terms, e.g., "OutLabel-list" vs.
"outlabel-list" (Section 5.1)

Please unify whether you capitalize all words in your headings or just some.

    #Ahmed: Unified

-- 
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call