Tsvart early review of draft-ietf-intarea-frag-fragile-09

Gorry Fairhurst via Datatracker <noreply@xxxxxxxx> · Tue, 16 Apr 2019 10:50:21 -0700

Reviewer: Gorry Fairhurst
Review result: Almost Ready

I have reviewed draft-ietf-intarea-frag-fragile-09.

This document has been reviewed as part of the transport area review
team's ongoing effort to review key IETF documents. These comments were
written primarily for the transport area directors, but are copied to the
document's authors and WG to allow them to address any issues raised and
also to the IETF discussion list for information.

When done at the time of IETF Last Call, the authors should consider this
review as part of the last-call comments they receive. Please always CC
tsv-art@xxxxxxxx if you reply to or forward this review.

Overview

This document presents advice on the use of fragmentation and what should be
supported. Most of the recommendations are presented in other RFCs, but
fragmentation is an important topic to consider and one that has received
significant recent attention. This document is almost ready, but has issues, in
particular relating to other IETF working group documents that should be
addressed before it proceeds.

(1)
“Software libraries SHOULD include provision for PLPMTUD for each supported
transport protocol.” - The above text in Section 7.2 sounds like a good
recommendation, but it may not yet be a reality for many TCP implementations
and UDP support is being specified as DPLPMTUD in TSVWG. I suggest this NOT an
issue for this being a recommendation in a BCP in future, but to me it would
create a normative dependency on [draft-ietf-tsvwg-datagram-plpmtud].

(2) Alignment is needed on content and recommendations with
draft-ietf-intarea-tunnels. That draft explicitly speaks of the need for
fragmentation to support tunnels, and it seems that this is a little at odds
with what is described in the draft being reviewed.

It may be that the WG thinks the intended audience is different - since
draft-ietf-intarea-tunnels focuses only on nodes providing tunnel services,
whereas the draft being reviewed appears to focus on endpoints. None-the-less
both are targeting BCP status, and both are from the same WG. As an example,
Section 9 appears to be an overall discouragement on the use of fragments.

It could be that this creates a normative dependency on two active WG drafts,
currently cited as informational: [I-D.ietf-intarea-tunnels] - because this
draft needs to be absorbed with that draft when reading about tunnels. Both
appear to target BCP status.

The opportunities for confusion would be considerably reduced if the WG thought
it was able to ensure that the two documents clearly defined their own scope -
and even better if the two could be published as a document pair so that the
RFC-series clear talks about the fragility of using fragments from endpoints
and the need to  support fragments to support tunnels.

Major/Technical comments by section:

Section 2.1 on Note 1
“   So, for the purposes of this document, we assume that the IPv4
   minimum link MTU is 576 bytes.”
- I expect this is mostly true - but I question whether this extra clause is
wise to state within a BCP? --- Section 4.4 Technical: - Please ensure the text
clearly supports recommendation in Section 7.4. At first reading, it was not
clear whether the points were different or the same. It was not clear to me
from the text why this is described as "a partial solution" to this problem. In
other places, ECMP use of the flow label is recommended, althoughI could
understand perhaps that maybe uptake of this approach is still something that
has yet to be the expected default.  None-the-less it seems that if deployed
this would help alleviate other problems. Section 7.4 seems to include a
recommendation to do this, so maybe the text in section 4.4 could be reordered
to more logically lead the reader to the recommendation in section 7.4?

I see that the current draft of draft-ietf-intarea-tunnels agrees on this
recommendation. --- Section 4.4/4.7.3: - The description doesn’t talk about the
more generic issue that arises when there is no return reachability from nodes
in the network when different paths are used in the forward and return
directions. This is not quite the same as the issue in 4.4, but it is a
different outcome. ---- Section 5.1: “   By contrast, PLPMTUD does not rely
upon the network's ability to
   deliver ICMP PTB messages.  However, in many loss-based TCP
   congestion control algorithms, the dropping of a packet may cause the
   TCP control algorithm to drop the congestion control window, or even
   re-start with the entire slow start process.  For high capacity, long
   round-trip time, large volume TCP streams, the deliberate probing
   with large packets and the consequent packet drop may impose too
   harsh a penalty on total TCP throughput for it to be a viable
   approach.  [RFC4821] defines PLPMTUD procedures for TCP.”

- I think this is a little imprecise, please check RFC4821. I’d
suggest something like:

   By contrast, PLPMTUD does not rely upon the network's ability to
   deliver ICMP PTB messages. It utilises probe messages sent as
   TCP segments to determine if the probed PMTU can be successfully
   used across the network path. In PLPMTUD, probing is separated
   from congestion control, so that loss of a TCP probe segment
   does not cause a reduction of the congestion control window.
   [RFC4821] defines PLPMTUD procedures for TCP.”
---
Section 5.1:
  “While TCP will never cause the underlying IP module to emit a packet
   that is larger than the PMTU estimate, it can cause the underlying IP
   module to emit a packet that is larger than the actual PMTU. “
- Is that really true? I agree TCP never emits a normal segment that results in
a packet with a size larger than the PMTU estimate, but when performing a probe
to increase the PMTU I’d have thought this requires sending the probe with size
larger than the PMTU. --- Section 7.1: - Final paragraph does not directly talk
about tunnels where it is not possible to support the minimum packet size for
the encapsulated packet over a path that only supports the minimum packet size.
Maybe as in Section 6, this could refer to INTAREA Tunnels
[I-D.ietf-intarea-tunnels], for more general recommendations on what to do? ---
Section 7.3 (echoed in Section 7.5): “Middle boxes SHOULD process IP fragments”
- Process seems a vague term. … Please check this recommendation does not
contradict this for NATs (e.g.,  RFC 4787). Technical question on Section 7.3.
- How is a NAT expected too work with UDP checksum fragments for a zero
checksum? - Section 5.3.1 states the NAPT should recalculate based on
reassembly of the entire datagram. --- Section 7.4: - Appears to need RFC
references to ECMP e.g. [RFC438 and RFC7098], LAG, etc. It may in this case
also be useful to comment on GRE in UDP... which clearly had other intentions
and where this BCP therefore creates new precedents. - [RFC6438] appears to be
cited normatively. --- Section 7.5:
 "including making sure the network generates PTB packets when dropping
   packets too large compared to outgoing interface MTU."
- As a BCP we need to be clear, this seems open to misinterpretation, because
ICMP PTB messages are usually rate-limited (practically this may happen at
multiple levels), and I presume this BCP does not seek to change that. Careful
wording is hence required here so as not to infer that every "large" packet
needs to generate a "PTB Message". --- Section 7.5: "   Network operators
SHOULD NOT filter IP fragments if they originated
   at a domain name server or are destined for a domain name server."
- It does not say why this specific requirement identifies one service (dns),
rather than a general requirement. In particular, does this require devices,
such as NAPT and Firewalls to keep state just for this one protocol. if so why?
That's not to say I don't agree, it is just I do not see this explained. (Is
this motivated by DNSsec or some other use?) --- Section 9: - This appears to
be an overall discouragement on use of fragments, whereas the current work on
INT Area tunnels appears to say in specific cases fragments are necessary - see
overview comment.

------------------------------------------------------

Minor/Editorial notes by section:

Section: Abstract
- This is short, probably the document would be better with more of a
description about what this particular draft describes. This comment also
echoes the overview comment saying there is a need to help a reader understand
whether they should read this document or draft-ietf-intarea-tunnels or both.
---- Section: 1. "Operational experience [Kent] [Huston] [RFC7872] reveals that
IP
 fragmentation reduces the reliability of Internet communication.
 This document describes IP fragmentation and explains how it reduces
 the reliability of Internet communication.  "
- I argue that this text uses the word "reliabitilty" a little wrongly from a
transport perspective. I like the word "fragile" that was introduced and I
expect the sentence could be reworked to talk in terms of the "increases the
fragility". --- Section: 1. - The following text in the Introduction could be
over-stated: “ Furthermore, fragmentation
   is expected to work in limited domains where security and
   interoperability issues can be addressed.”
-  I suggest thinking about whether the word “limited” could be removed from
this sentence. Any domain that can resolve security and interoperability
issues, can also deploy this. ---- Section 2.1: “ PMTUD is susceptible to
attack because ICMP messages are easily
      forged [RFC5927].”
- I am unsure this is the actual problem, so I'd argue against this text. To
me, the security-related problem is that the source node that receives the PTB
message does not or can not authenticate that the PTB message originated from a
node that actually was on the path being used. (The topic of validating ICMP is
further discussed in draft-ietf-tsvwg-datagram-plpmtud.) --- Section 2.1:
"...because Internet paths are dynamic, PMTU is also dynamic." - The current
text should be improved: I think I understand that this means the set of links
and routers forming a path can change (due to link failure, load sharing,
re-configuration, etc). When the path changes, the PMTU can also change. ---
Section 2.1: - Note 1 describes ICMP for IPv4, but does not refer to RFC1812.
This is the place where IPv4 routers are required to return a quoted packet
with as much of the original datagram as possible without the length of the
ICMP datagram exceeding 576 bytes. Please add this as reference, since it is
important to validating that ICMP PTB messages are received on-path. ----
Section 4: - The title implies this is about reliability, although I would
argue that this is not the case, and the word is mis-used. Would "Increased
Fragility" work? or something else that does not imply the communication is
less reliable, but it could do for a UDP transport that is unable to recover,
or a TCP session that is black-holed (i.e. unusable) but still reliable as a
service (bad stuff is not delivered). However that's kind of a higher-level
issue. --- Section 4.1: "IP Fragmentation causes problems for routers that
implement policy-
   based routing."
- Is it always true? This statement seems incorrect, although I would agree
with the assertion that problems can arise when fragmentation is used and
policies refer to information only available in the first fragment. --- Section
4.2: - This section does not refer to the appropriate transport area drafts
issued by the BEHAVE working group. - It would be useful to refer to RFC 5508
on the ALG requirements for a NAPT processing ICMP and probably should refer to
Hairpinning Support for ICMP Packets. -[RFC4787] is a BCP describing NAT
handling of IPv4 Fragments. I suggest you also check [RFC6864], which is a PS
describing the update to the IP-ID, and rewriting middleboxes. --- Section 4.2:
“Virtual reassembly in the network is problematic” - I agree, but I suggest the
document explains the meaning of “virtual reassembly” and why this is
“problematic”. --- Section 4.4:  Equal Cost Multipath (ECMP), Link Aggregate
Groups (LAG) - Some RFC references would be good here to ECMP, LAG, etc. ---
Section 4.4:
  "it will assign all fragments belonging to a packet to
   the same link."
- Section 4.1 describes this in terms of a FIB entry, is that different to a
link here? or the same? --- Section 4.6: “NIDS aims at identifying malicious
activity” - please define “NIDS”. --- Section 4.6:
  "Each IP fragment contains an "Identification" field that destination
   nodes use to reassemble fragmented packets.  Many implementations set
   the Identification field to a predictable value, thus making it easy
   for an attacker to forge malicious IP fragments that would cause the
   reassembly procedure for legitimate packets to fail."
- This appears to describe a broken pathology, rather than a RFC
recommendation. I think there is always a danger with such wording that people
cite the pathology as acceptable. I would strongly encourage the text to be
reworked to explain that RFCxxxx requires xxxx, and then state the issue. - In
this case I suggest you also refer to the specification in [RFC6864]. ----
Section 4.7:
 "It is subject to both transient and persistent loss."
- Please reword to specify what "it" is.
- This section needs to define what transient loss means before section 4.7.1.
I suggest this is subtle, since there are various reasons why a specific ICMP
message may not be received. --- Section 4.7.1. - I suggest you also add link
loss, mentioning wireless networks as an example of links where propagation
conditions could in some circumstances prevent delivery. --- Section 5.1:
“Stream Control Protocol (SCP) [RFC4960]” - Is this intended to be SCTP? (The
reference suggests this). --- Section 5.1: The PLPMTUD method can also can be
used with [I-D.ietf-tsvwg-udp-options]. ---- Section 5.2: " [RFC8085]
recognizes that IP fragmentation reduces the reliability of
 Internet communication. "
- I'd argue that it "can reduce the reliability and efficiency of
communication", rather than it does. --- 6.1 DNS - Please expand “OSPF” and
cite an RFC. --- 6.2. OSPF - Please expand “OSPF” and cite an RFC. --- Section
6: - This speaks about packet-in-packet encapsulation, but does not explicitly
include tunnels in the title. The distinction may be intentional since another
ID targets BCP advice on tunnels. However, I think the draft needs to have a
section title that notes the differences for tunnels and points to that draft,
or this section needs to include tunnels as well as endpoint encapsulation. ---
Section 6.4: - I found the title of this section a little odd. Is it possible
that the title here could be changed to reflect "Other upper layer protocols
that rely on fragmentation" and then cite this (slightly esoteric) link
protocol as another example. I am pretty sure that we don't wish to enumerate
every example of a UDP app that relies on IP fragmentation to work, so an
example seems a good way to note this. - It may also be quite wise (?) to note
that Section 3.2 of RFC8085 is a BCP that discourages this design, so that this
is not used as a BCP to justify this design. --- Section 7: "and ensuring that
the transport protocol
   in use adapts its segment size to the MTU."
- This needs to be "Path MTU", they all adapt to the local link MTU.
---
Section 7:
- The final sentence should include a reference to PLPMTUD and DPLPMTUD.
---
Section 7.1:
- This is restating Section 3.2 of RFC8085 - which should be cited. The text
there states:
 " Applications that fragment an application-layer message into multiple
   UDP datagrams SHOULD perform this fragmentation so that each datagram
   can be received independently, and be independently retransmitted in
   the case where an application implements its own reliability
   mechanisms."
---
Section 7.1:
 "Developers MAY develop new protocols or applications that rely on IP
   fragmentation."
- How can this be enforced? (Is this really a "MAY" in RFC 2119?, or is
it just the result of not taking the SHOULD in RFC8085?)
---
Section 7.3:
- Although not strictly necessary, I would recommend avoiding the doubt when
this BCP is (mis)quoted, by explicitly changing /may/ to /could/ in the
following sentence:
  "These stateless middle
   boxes may perform sub-optimally, process IP fragments in a manner
   that is not compliant with RFC 791 or RFC 8200, or even discard IP
   fragments completely."
---
Section 7.3:
"   Protocols may be able to avoid IP fragmentation by using a
   sufficiently small MTU (e.g.  The protocol minimum link MTU)."
- In previous WG documents, I've cited RFC8085 section 3.2 to assert this. I
think this should be referenced here. That states: "Applications that do not
follow the recommendation to do PMTU/PLPMTUD
   discovery SHOULD still avoid sending UDP datagrams that would result
   in IP packets that exceed the path MTU.  Because the actual path MTU
   is unknown, such applications SHOULD fall back to sending messages
   that are shorter than the default effective MTU for sending (EMTU_S
   in [RFC1122])."
---
Section 7.4:
- This appears to reaffirm the existing advice in [RFC438] and [RFC7098] and
these need to be cited. --- Section 7.5: - This lacks RFC references, but names
RFCs - should these be changed to citations? --- Section 9: Security
Considerations. - Many cited RFCs not security-related issues directly
applicable to the use of fragments. It would be really helpful if key security
recommendations are referenced in the security considerations section.

------------------------------------------------------

References:

* Key documents are not referenced.

* Normative requirements are placed on documents that are not listed as
normative. These include two drafts currently awaiting WGLC:
        draft-ietf-intarea-tunnels
        draft-ietf-tsvwg-datagram-plpmtud

* Additional Suggested References:

[RFC4787] is a BCP describing NAT handling of IPv4 Fragments

[RFC5722] is PS and appears to be something that could usefully be cited
normatively in a BCP on this topic.

[RFC6438] is a PS that appears to be cited normatively.

[RFC6864] is a PS describing the update to the IP-ID, and rewriting middleboxes.

[RFC5508] is a BCP that describes the ALG requirements for a NAPT processing
ICMP.