Thanks for the review Gorry, please see below. On 10/7/2011 4:28 AM, Gorry Fairhurst wrote:
Hi, all, I've reviewed this document as a part of the transport area directorate's ongoing effort to review key IETF documents. These comments were written primarily for the transport area directors, but are copied to the document's authors for their information and to allow them to address any issues raised. The authors should consider this review together with any other last-call comments they receive. Please always CC tsv-dir@xxxxxxxx if you reply to or forward this review. The document describes a protocol that is designed to use a TCP or SCTP transport. The use of TCP is application-limited and a negotiation mechanism is provided that helps select whether TCP or SCTP should be used. The note below includes transport issues, as well as additional comments that I hope are also useful. I hope this feedback will be useful to the authors, and will be glad to provide further assistance either on- or off-list as useful. Gorry ----- Overall this document seems complete and uses standard IETF transport mechanisms that support congestion control. Introduction: Recommend to specify the IANA-assigned port. Since this specifies a protocol that runs over a specific IANA-allocated port it would be good if the port number was mentioned in the Introduction. This may confirm that this is the correct document to read to find out about what happens when port 8471 is used (e.g. someone interpreting IPFIX or building a NAT, Firewall, etc).
OK, can do that. I think they would typically check the IANA registry first though?
Section 4 (and others, including section 7): AF for transport. I understand there is a rule to prefer SCTP over TCP when both transports are available. This seems OK. Various sections refer to the support for multiple address families (IPv4, IPv6) and I understand that the PORT information for an AF does need not to be carried over a transport connection using the same AF. What I do not yet understand is how PORT knows whether to use an IPv4 transport for IPv6 or to use an IPv6 transport for IPv6. I'd like this to be clear, so that I can understand what happens when TCP is available over Ipv4 and SCTP only over IPv6 and similar combinations.
Which address family and address to use is determined by the Connection ID in the PORT Hello Option. It is up to the implementation and admin to choose which address the connection should go to. There are several implementation choices, but I think it will be common to use the primary interface address (the source address of PIM messages) as a default Connection ID, but allow the administrator to configure another address. The administrator can then choose which address and family to configure.
Section 4.2: TCP keep-alive. SCTP heartbeat is described. I understand PORT also includes an optional keep-alive mechanism using the transport service. TCP also contains an optional keep-alive mechanism. This should be mentioned. Is TCP keep-alive recommended for PORT or does the protocol recommend a keep-alive at the PORT layer?
We chose to have our own keep-alive mechanism, as we believe that is better. There is nothing stopping an implementation from enabling TCP keep-alive if they want though. I guess we could do this change: OLD: SCTP has a heart beat mechanism that can be used to detect that a connection is working, even when no data is sent. NEW: SCTP has a heart beat mechanism that can be used to detect that a connection is not working, even when no data is sent. Many TCP implementations also support sending keep-alives for this purpose. Note that I changed "working" to "not working", I think it makes more sense that way. We're trying to not give any specific recommendations here, only enough to ensure interoperability. Both TCP keep-alive and the use of PORT keep-alive can be decided by independently by the peers. Note that this is an experimental protocol, and hopefully what works best, or is the preferred solution will become clearer with real use.
Section 4.2: TCP Reset The present text says that the TCP connection will be reset "after a few reties". This does not seem clear. A lack of acknowledgement for a sent data segment will result in the underlying TCP tansport retransmitting after the Retransmission Time Out (RTO). Furthermore this would cause the RTO to be doubled with each retransmission, until the RTO exceeds the maximum value when the connection will be reset, this is typically many 10s of seconds later.
OK, what if I change it into "a few TCP retransmissions"? The main purpose here is to explain that we notice the connection is going down as long as we periodically send data. We're not going into details about exactly how long it takes.
Section 5: Unexpected corruption of the stream It is good to see the service using the transport, in this case PORT verifies the integrity of the transported data. The recommendation seems to be that PORT skips any unknown data. While this may be good for interoperability between different flavours of the protocol, it is not good in terms of robustness, which could be an issue for long-lived connections. I suggest that if such errors occur they should be made visible, so that operators are aware that there are either PORT protocol issues or transport issues. That way a high count of such errors would alert a possible underlying problem.
OK, we could certainly say that there SHOULD be some kind of alert with a large amount of unknown options. However, if we later define a new option, then an old implementation could see a lot of messages with unknown options, while there is nothing wrong.
Could a single error in the transport result in loss of farming? I suspect so. I think it should be considered whether it be wise to close the connection (and reopen a new transport connection) after a repeated number of such errors.
Yes, it could. I don't know if it is common for TCP based protocols to guard against it, but I understand there is some value in detecting it. I'm not sure if we need to guard against broken implementations that say pass an incorrect length. If there is a transmission error, like a few bits in a packet getting mangled (I don't know how likely that is, but it can happen), then we are in trouble only if the length of a TLV gets modified. If that happens, we lose framing and will never recover. We may then get all unknown options. I have some concerns with saying that an implementation should reset the connection if only unknown options are received then. How to do that without affecting backwards compatibility. Recommendations on this are certainly welcome. I just spotted one issue with the draft btw. I was just thinking whether the PIM checksum. We are saying that we do the regular PIM error checks, which I think includes verifying the PIM checksum. This will help us detect some bit errors. We should probably be a bit more specific about the PIM checksum though. And in particular, for IPv6, the pseudo-header is supposed to included, but it is unclear what the value of the pseudo-header fields should be.
Section 9: ? The opening para concludes: /This the case even when the connection is down./ - I can see this case needs to be mentioned, but I am not sure from this text what exactly I am to understand.
We are saying that we will only use SCTP/TCP even if the connection is down. The main point is that we are not falling back to using native PIM joins/prunes. Do you want me to add that?
Section 11.4 This section describes critical and non-critical options. I could not find guidance for IANA on how to now whether a new option should be classified as one or the other.
This is simple. Any draft/RFC requesting an allocation would need to specify whether the option(s) are critical. IANA doesn't know...
--- There are two general observations about this usage of TCP, I leave it to the TSV ADs to decide if any of these topics should be discussed further: 1) As I understand the protocol, it is application-limited, in that it will not usually use the entire TCP congestion window, but rather sends "occasional" small messages under the control of PORT. It seems important that such use does not grow an unbounded cwnd and then emit large bursts of data into the path when the application produces bursts of data that exceed a few MSS. It may be wise to consider a TCP stack that includes rate-limiting or burst-limiting. RFC 3645 (EXP) may also help in this case.
I think I understand. Basically if we for a long period of time only send individual infrequent messages, the cwnd can grow and grow. And basically be too large if we suddenly want to send a large amount (a burst). Is that it? I'm not sure what we can do about it. We can maybe just describe the issue? RFC 3645 seems to be unrelated, is that the right number?
2) Since PORT may generate one or two segments of data per interval, TCP retransmission may be slow following loss, since there may not be sufficient to trigger fats retransmission using DupAcks. Limited transmit (RFC 3042, STD) may help in this case.
Yes, I think this could be an issue. Also here I guess we could perhaps just describe the issue? Maybe this and the above could be described in a short transport considerations section? These are also among the things that I hope there can be more experience with. This is an experimental protocol, and we may learn over time what is the most appropriate...
--- Editorial/General comments: Section 6: Use of keywords The opening para contains two statements that could need to be RFC 2119 keywords: /This is done for all PORT joins and PRUNES/ is this a MUST? /It may be done for..? is this a MAY?
In a way I would say that we are just stating the facts. There is no choice, it just has to be done for PORT. It also says shortly after: The set of ET neighbors MUST include the PORT neighbors. But I think it is good to say "This MUST be done for all PORT joins and prunes". And "It MAY be done". Will change that.
Page 6, section 2 /PORT capable/PORT-capable/
So basically PORT-capable throughout the document?
page 13, section 4.2 /until you actually try to send/until the PIM PORT module then tries to send/
I haven't used the term PIM PORT module, and not sure if want to introduce it. How about saying just "until PORT tries to send".
Section 10: ? This section describes security threats. Please could the editors identify whether these are on-path attacks or off-path attacks.
I would say off-path, unless unicast-RPF or similar checking is done. This is similar to any other TCP spoofing and injection. Nothing different for PORT. I could add a sentence on that, but I don't really think it is necessary. Stig _______________________________________________ Ietf mailing list Ietf@xxxxxxxx https://www.ietf.org/mailman/listinfo/ietf