Reviewer: Stewart Bryant Review result: Almost Ready I am the assigned Gen-ART reviewer for this draft. The General Area Review Team (Gen-ART) reviews all IETF documents being processed by the IESG for the IETF Chair. Please treat these comments just like any other last call comments. For more information, please see the FAQ at <https://trac.ietf.org/trac/gen/wiki/GenArtfaq>. Document: draft-ietf-6man-rfc1981bis-04 Reviewer: Stewart Bryant Review Date: 9/Feb/2017 IETF LC End Date: 1/Mar/2017 IESG Telechat date: unknown Summary: This draft is on the right track but has open issues, described in the review. This review together with the lengthy discussion on the IETF list suggest that this draft has a number of issues that need to be addressed before publication. I wonder if we would best serve both our future and our heritage if we declared RFC1981 as historic, and either left the idea there, or declared it as historic and wrote a new text from a clean start? Major issues: Nits points out a number of faults with the document, but the only one of substance is: The text has a lot of RFC2119 language, but no RFC2119 declaration. The document could use a thorough RFC2119 scrub The document lists the three original authors one with an affiliation change, but no email addresses. Has this been agreed with the original authors, and have arrangements been put in place for the RFC editor to process auth48? It is concerning that the draft does not talk in any detail about how modern ECMP works, i.e. using the five tuple, and noting that the PMTU may be different depending on the transport layer port numbers. Given that a very large fraction of packets will traverse an MPLS network at some point, I am surprised that there is no text talking about the importance of providing support for this feature in the MPLS domain. RFC3988 talks to this point, but is only experimental. ====== If flows [I-D.ietf-6man-rfc2460bis] are in use, an implementation could use the flow id as the local representation of a path. Packets sent to a particular destination but belonging to different flows may use different paths, with the choice of path depending on the flow id. This approach will result in the use of optimally sized packets on a per-flow basis, providing finer granularity than PMTU values maintained on a per-destination basis. SB> How widely is flow-id supported in networks? I thought that the SB> current position was that it was unreliable as an ECMP indicator SB> and thus routers tended to glean information from the packet themselves. ====== Note: if the original packet contained a Routing header, the Routing header should be used to determine the location of the destination address within the original packet. If Segments Left is equal to zero, the destination address is in the Destination Address field in the IPv6 header. If Segments Left is greater than zero, the destination address is the last address (Address[n]) in the Routing header. SB> So this has the effect that a traffic engineered packet and SB> a non-traffic engineered packet will have the lower of the SB> two PMTUs. This was all harmless when source routing was a curiosity SB> as far as mainstream networking was concerned, but may be SB> more of a problem as a result of the SPRING work. ======= 5.3. Purging stale PMTU information Internetwork topology is dynamic; routes change over time. While the local representation of a path may remain constant, the actual path(s) in use may change. Thus, PMTU information cached by a node can become stale. If the stale PMTU value is too large, this will be discovered almost immediately once a large enough packet is sent on the path. No such mechanism exists for realizing that a stale PMTU value is too small, so an implementation should "age" cached values. When a PMTU value has not been decreased for a while (on the order of 10 minutes), the PMTU estimate should be set to the MTU of the first-hop link, and the packetization layers should be notified of the change. This will cause the complete Path MTU Discovery process to take place again. SB> Should that be an RFC2119 SHOULD? SB> The impact of this advice is going to be a disruption to what might SB> be a critical service every 10 mins. SB> Should there be some advice along the lines of noting the SB> importance of service delivery as part of deciding whether to SB> test for bigger PMTU vs improving efficiency? ======= Minor issues: IPv6 defines a standard mechanism for a node to discover the PMTU of an arbitrary path. SB> Do you mean "This document defines ....."? Otherwise this needs SB> a reference. ======= An extension to Path MTU Discovery defined in this document can be found in [RFC4821]. It defines a method for Packetization Layer Path SB> Rather than have the reader figure out what "It" is, perhaps SB> s/It/RFC4821/ ======= Upon receipt of such a message, the source node reduces its assumed PMTU for the path based on the MTU of the constricting hop as reported in the Packet Too Big message. SB> We should perhaps state up front that this procedure SB> hunts for the worst case of the ECMP set associated with the SB> ingress nodes PMTU classifier. ======= If a node receives a Packet Too Big message reporting a next-hop MTU that is less than the IPv6 minimum link MTU, it should discard it. SB> Should that be an RFC2119 SHOULD? ======= 5.2. Storing PMTU information Ideally, a PMTU value should be associated with a specific path traversed by packets exchanged between the source and destination nodes. However, in most cases a node will not have enough information to completely and accurately identify such a path. Rather, a node must associate a PMTU value with some local representation of a path. It is left to the implementation to select the local representation of a path. SB> Is it worth noting the five tuple since that is how a lot of SB> load balancers work? ======= The set of paths in use to a particular destination is expected to be small, in many cases consisting of a single path. SB> I am not sure that remains true in modern networks. ======= One approach to implementing PMTU aging is to associate a timestamp field with a PMTU value. This field is initialized to a "reserved" value, indicating that the PMTU is equal to the MTU of the first hop link. Whenever the PMTU is decreased in response to a Packet Too Big message, the timestamp is set to the current time. Once a minute, a timer-driven procedure runs through all cached PMTU values, and for each PMTU whose timestamp is not "reserved" and is older than the timeout interval: - The PMTU estimate is set to the MTU of the first hop link. - The timestamp is set to the "reserved" value. - Packetization layers using this path are notified of the increase. SB> Such detailed implementation advice is uncommon in modern RFCs. It has SB> the disadvantage of de-facto standardizing something that should be left to SB> the innovation of the implementer. ======= 5.4. TCP layer actions SB> TCP implementations have moved on a lot since this section was SB> written. Is this still current best practise? ======= 5.5. Issues for other transport protocols Some transport protocols (such as ISO TP4 [ISOTP]) are not allowed to repacketize when doing a retransmission. SB> How much TP4 is there going over IPv6? Doesn't this example SB> show the IETF as not being in the modern age? ======= Nits/editorial comments: upper layer a protocol layer immediately above IPv6. Examples are transport protocols such as TCP and UDP, control protocols such as ICMP, routing protocols such as OSPF, and internet or lower- layer protocols being "tunneled" over (i.e., encapsulated in) IPv6 such as IPX, AppleTalk, or IPv6 itself. SB> Everything in the list above is in the well known list, except SB> IPX, so technically it needs expansion. However it might be nice SB> to use some modern example in common use. ======= link a communication facility or medium over which nodes can communicate at the link layer, i.e., the layer immediately below IPv6. Examples are Ethernets (simple or bridged); PPP links; X.25, Frame Relay, or ATM networks; and internet (or higher) layer "tunnels", such as tunnels over IPv4 or IPv6 itself. SB> Technically X.25 needs a reference, since it is not "well known" ======= path the set of links traversed by a packet between a source node and a destination node. SB> Is it a set of links, or a set of links and nodes? ======== the value of MMS_S, the "maximum send transport-message size". SB> The modern convention is full-name(abbreviation) ======== The Sun Network File System (NFS) uses a Remote Procedure Call (RPC) protocol [RPC] that, when used over UDP, in many cases will generate payloads that must be fragmented even for the first-hop link. This might improve performance in certain cases, but it is known to cause reliability and performance problems, especially when the client and server are separated by routers. SB> Perhaps this should point to RFC7530 (the current NFS Spec), assuming SB> the behaviour description is still correct. ========= The former can be accomplished by associating a flag with the path; when a packet is sent on a path with this flag set, the IP layer does not send packets larger than the IPv6 minimum link MTU. SB> We do not normally give this level of implementation advice ========================