Re: [Last-Call] Tsvart last call review of draft-ietf-mpls-sfl-control-04

"Scharf, Michael" <Michael.Scharf@xxxxxxxxxxxxxxx> · Mon, 26 Feb 2024 15:21:32 +0000

Hi Stewart,

Thanks for the reply. More below.

> -----Original Message-----
> From: Stewart Bryant <stewart.bryant@xxxxxxxxx>
> Sent: Thursday, February 22, 2024 5:18 PM
> To: Scharf, Michael <Michael.Scharf@xxxxxxxxxxxxxxx>
> Cc: Stewart Bryant <stewart.bryant@xxxxxxxxx>; tsv-art@xxxxxxxx; draft-ietf-
> mpls-sfl-control.all@xxxxxxxx; last-call@xxxxxxxx; mpls@xxxxxxxx
> Subject: Re: [Last-Call] Tsvart last call review of draft-ietf-mpls-sfl-control-04
> 
> Dear Michael
> 
> Thank you for your review of this document.
> 
> Please see inline for discussion.
> 
> Best Regards
> 
> Stewart
> 
> > On 24 Nov 2023, at 19:20, Michael Scharf via Datatracker <noreply@xxxxxxxx>
> wrote:
> >
> > Reviewer: Michael Scharf
> > Review result: On the Right Track
> >
> > This document has been reviewed as part of the transport area review team's
> > ongoing effort to review key IETF documents. These comments were written
> > primarily for the transport area directors, but are copied to the document's
> > authors and WG to allow them to address any issues raised and also to the
> IETF
> > discussion list for information.
> >
> > When done at the time of IETF Last Call, the authors should consider this
> > review as part of the last-call comments they receive. Please always CC
> > tsv-art@xxxxxxxx if you reply to or forward this review.
> >
> > The document defines a basic control protocol over a Generalized Associated
> > Channel (G-ACh) in an MPLS network. The purpose of the simple protocol is to
> > configure certain state on an MPLS router.
> >
> > *Major issues*
> >
> > The document does not discuss many typical message transport issues,
> including
> > amongst others:
> >
> 
> > - What happens if a message (request or reply) gets lost, e.g., due to bit
> > errors, congestion, receiver failure, etc.
> 
> SB> The message exchange described in this document sits between an
> application that wishes to use this MPLS feature and the MPLS forwarding
> system. As such it delegates reliability of message exchange that request labels
> to the application. If a refresh of withdraw message is lost beyond the retry limit
> the label lifetime mechanism will ensure that they eventually die.

I disagree that a protocol specification is complete without any discussion of errors and error handling.

If error handling could just be left to the application designer, the IETF could just standardize the message formats of protocols and leave it up to implementers to figure out how to implement remaining aspects of a protocol. A lot of RFCs would be much shorter that way... But most likely the outcome would be incompatible implementations.

Now, in case of this relatively simple low-level protocol, not a lot of additional text would be needed to deal with errors. For instance, why do you not simply add your explanation to the I-D?

> > - Whether a message can get too large to get transmitted, e.g., because it
> > exceeds the MTU
> 
> SB> An application that sits this low in the network could reasonably be
> expected to know the MTU between the two routers concerned and reject the
> application request. Such a rejection is not an on the wire matter and thus out
> of scope. If the packet was successfully sent and failed to arrive due to MTU the
> previous error case arrises. If the packet was truncated it will fail the length
> check and an error would be returned.

I agree that routers may have some understanding of the configured MTU on a link from their point of view. But whether that MTU value configured in a router is really correct is a question of its own and depends on the underlying technology, which could be configured differently like the router. IMHO the text needs to have some basic discussion on the MTU. Your description could be a starting point.

It could also help to have some estimation whether typical message sizes could get of the order of typical MTU values (1500 byte, 9k byte, etc.). If the message sizes will never exceed few hundred bytes, MTU issues are most likely a seldom corner case. Otherwise, more care would be needed to specify the correct behavior in the protocol specification.

> > - What happens if a router gets overloaded, e.g., the router control plane
> > cannot handle requests
> 
> SB> Either there is no response or the response is late and that is again a matter
> for the application to resolve.

IMHO there is an interop issue: The application that needs to resolve the problem is in the sender of a message. But the actual problem could be the application in the receiver. Thus the protocol between both applications needs to be robust in case of overload. Most notably, the application in the receiver must be able to rely on reasonable behavior in the sender. Therefore, a protocol specification is needed.

Now, this specific protocol is very simple, so probably very basic mechanisms such as exponential backoff of timers (RFC 8961) are sufficient. That is not hard to add to the document.

> SB> Note that the envisaged use of this protocol is in support of low level
> network management and instrumentation applications and as such the
> application will be used in the context of a system that knows the detail of the
> network operations.

I may be in the rough part of the consensus, but I don't share that very optimistic view.

> > - (more could be added)
> 
> SB> Indeed, but I think the protocol is adequate for the application.

As TSVART reviewer, I disagree.

Note that some few additional paragraphs could cover what is currently missing.

> > Even in a well-managed MPLS network errors and failures can probably occur.
> >
> > Yet, it is impossible to determine whether the proposed protocol design is
> > indeed robust in such cases, given the lack of specification and also the lack
> > of normative guidance regarding many details.
> 
> >
> > As many design choices are left to the implementer, it is also difficult to
> > understand if different implementations would indeed correctly interop, most
> > notably in the reaction to failure cases. It is not clear whether there has any
> > experimentation with this protocol.
> 
> SB> The design choices are not placed with the implementer they are placed
> with the application designer and interoperability of the two halves of the
> application would require agreement on the matters you raise.

Then you may need a separate specification to define the interface between these "two halves of the application". I am not sure whether a further RFC makes sense here.

> > *Minor issues*
> >
> > - There is probably an unstated assumption that "Session Identifier" values
> > must be different in subsequent messages.
> 
> They need to be different in different sessions.
> 
> Note that this protocol is a direct derivative of RFC6374 in which it was
> considered obvious that different sessions would use different session identifiers,

Well, at least to me it is not obvious at all.

And if the session identifier is indeed the same in subsequent messages within the same session, a discussion is needed in the document how to deal with message duplication.

> > - To prevent congestion or receiver overload, the statement "A Querier MUST
> > wait a configured time (suggested wait of 60 seconds) before re-attempting
> > negotiation for a resource." is not sufficient. A robust protocol design would
> > typically required normative statements mandating a minimum timeout value
> and
> > an exponential timer backoff.
> 
> Given the application of this protocol and the bandwidths available in the
> networks that it is deployed in, it seems extremely unlikely that this level of
> message traffic is going to significantly stress the network. I would not be
> worried about the 3 messages about 60 seconds apart. If the messages fail to
> arrive then either the application will take an exception because the labels
> were not allocated, or the labels will die of old age. Thus I believe that the
> protocol is adequately robust for the application it is targeted at and unlikely to
> overload the network causing congestion.

According to RFC 8961, what is needed is:

1/ A MUST regarding a minimum value

2/ A MUST mandating exponential timer backoff

Note that I am also not worried much about a single 60 sec timeout.

What I am worried about is an implementer who either sends the timeout to, say, 1 msec and/or does not exponentially backoff the timer after errors. That is both allowed by the current I-D text, given the lack of normative guidance.

As TSVART reviewer, I ask to align the I-D text with RFC 8961.

Michael

> 
> >
> > *Nits*
> >
> > - In Section 1 apparently a full stop is missing after "This protocol is
> > designed for use in a well-managed MPLS network"
> 
> >
> > - In Section 1: "prodocols"
> >
> > - In the IANA section: "0x11 SFL-Unable" lacks the reference "This document"
> >
> 
> The nits are all fixed.
> 
> 
> >
> > --
> > last-call mailing list
> > last-call@xxxxxxxx
> > https://www.ietf.org/mailman/listinfo/last-call

-- 
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call