Responding to Cullen's comments on draft-ietf-behave-nat-behavior-discovery Many of the questions you raise point to the same question of whethertests or techniques that are known to fail on a certain percentage ofNATs under a certain percentage of operating conditions arenevertheless valuable. behavior-discovery has an applicabilitystatement http://tools.ietf.org/html/draft-ietf-behave-nat-behavior-discovery-06#section-1that discusses those issues in some detail. I spent enough timewording that statement and discussing it with various people that Ithink it is best to refer to that statement. You also repeatedly uses phrases such as "basically won't work" and"it might work." The comes down to the value of "certain percentage"as used above. My experience with these techniques, and theexperience of those who have used such techniques recently, is thatthey are far more reliable than that, into the 90% range, particularlywhen used correctly. That is not high enough that we could go back to3489---all techniques require fallbacks because they fail, and 90% isfar, far too low of a success rate---but it is high enough thatapplications can make useful decisions based on that information,provided they have a fallback in cases where the information is wrong. And those are the conditions of the experiment. On Tue, Mar 31, 2009 at 11:08 PM, Dan Wing <dwing@xxxxxxxxx> wrote:> Forwarded for those that don't follow the main IETF list.> -----Original Message-----> From: ietf-bounces@xxxxxxxx [mailto:ietf-bounces@xxxxxxxx] On Behalf Of Cullen> Jennings> Sent: Tuesday, March 31, 2009 9:53 AM> To: IETF Discussion; IESG IESG> Subject: Re: [BEHAVE] Last Call: draft-ietf-behave-nat-behavior-discovery> (NATBehavior Discovery Using STUN) to Experimental RFC>>> I was somewhat shocked to see the draft in IETF Last Call. The last> time this draft was discussed at the microphone in Behave, many people> were very concerned that it id not possible to correctly characterize> a NAT This is not true. behavior-discovery was briefly presented at IETF71without any comments and at IETF70 with only minor comments. Thelast time it was discussed at length at the mic was at IETF69, whichwas where it was decided to change it from standards track toexperimental. However, let me address two specific points hereregarding your characterization of that discussion: "many people were very concerned": What concerns people had wereabout its previous standards-track status. Subsequent feedback on thelist and at IETF70 have indicated these concerns are resolved. "that it is not possible to correctly characterize a NAT": First, letme emphasize that the key distinction between 3489 andbehavior-discovery is that behavior-discovery is very clear that it isnot possible to characterize a NAT, that only snapshots of behaviorfor particular source-dest tuple at an instant in time are possible. > without using more than one address behind the NAT. The tests> done on on NATs by the researches at MIT did that, so did the the> stuff from Cornell, as did draft-jennings-behave-test-results. Multiple addresses are definitely required to characterize the NAT (tothe extent it's ever possible), but as behavior-discovery is veryclear that it is not trying to replicate that aspect of the 3489behavior, is not precisely relevant. > The> reason why this was needed is largely the reason why the IETF invented> ICE. Initially folks thought that STUN alone would be enough to do NAT> traversal. This turned out not to be true, STUN deprecated those parts> and ICE was started. This draft fails to describe the types of test> that have actually been found to work and just reinstates the stuff> that was deployed and failed and then deprecated out of STUN. This draft makes no claim that it is duplicating or attempting tomimic the original intention of 3489 or the capabilities of ICE. Itcarefully describes when the tests it includes can be used andpresents examples of how an application might make use of it forsituations that ICE does not address. The only use by applicationsproposed in the draft (as an experiment) is for an application thatuses it for initial mode selection but is capable of adapting to itsactual experience on the network. > Now this draft pays some lip service to the fact that it basically> won't work. You can read section 1 and get the full idea. This term "basically won't work" is a gross oversimplification. It'salso not a technical analysis, which makes it difficult to respond toin a technical way. More generally, one of the important differences between 3489 and ICEis that ICE ensures there is always a fallback to TURN, and thusavoids the problem experienced by 3489-based applications that triedto determine in advance whether they would need a relay and what theirpeer reflexive address will be, which are both impossible.behavior-discovery requires an application using it to have afallback, but unlike ICE's focus on the problems inherent in VoIPsessions, doesn't assume that it will only be used to establish aconnection between a single pair of machines, and so alternativefallback mechanisms may make sense. i.e. in a P2P application, it maybe possible to simply switch out of the role where such connectionsneed to be established, or to select an alternative indirect route ifthe peer discovers that in practice, 10% of its connection attemptsfail. > The first> and 2'nd par basically say this won't work. Then para 3 proposes this> is experiment to find out something we already know the answer to. The experiment described is so totally different than 3489's claimthat a NAT can be characterized, labeled, and all future applicationdecisions rely on that behavior that it's hard to respond to this. > When this work was chartered, it was about making a way to> characterize NATs and describe them in a controlled lab like> environment. Here is how the work was chartered in the May 2007 update to the BEHAVE charter: Sep 2007 Submit standards-track document that describes how anapplication can determine the type of NAT it is behind So it was not at all chartered for lab analysis, it was chartered foruse by an application. > It was not about resurrecting exactly the part of STUN> that had been tried, failed , and deprecated. As already stated, it deliberately tries to outline when thesetechniques are applicable and when they aren't. >> Specific problems with the draft.... For other readers' benefit, the section numbers you use in thissection refer to revision -04 of the draft. The current revision is-06. >> 2.2 - this just won't work. The test described in this draft will not> find out if the node is behind an endpoint independent nat. I have> specific nats where it won't work. I have explained to the authors why> it won't work. The answer I get back is "it might work some of the> time". It true it might work some of the time but we all agree there> are many NATs for which it will not work. (I'm not sure what section this text was referring to) Again, we're not searching for an existence proof of NATs where it doesn't work. More importantly, don't put words into other people's mouths,especially when the statement is not true. You have never received an"it might work some of the time" response. The response has alwaysbeen of the form that it works most of the time on most NATs. If you have useful information on the population of NATs that failwith these techniques a significant amount of the time, please sharethat information. I've asked, but have not received any information. >> Other section that don't work are 3.1, 3.2, 3.3, 3.4, 3.5, 3.5 - uh> all of them actually. I'm glad to provide details on why they don't> work but I have in the past and we not really debating if they work or> not. The authors believe there is sufficient text at the beginning of> the draft in section 1 that it is OK that these fail in many cases and> don't need to be mentioned again. We not debating these work some of> the but not all the time - everyone agrees on that.>> Section 4.1 - The results in here will be just wrong for ports> different than the one the test was run on. The response to this was> to add "use same port when possible". That's not going to exactly> cause applications to work. First, this is something of a corner case in any event. Secondly, alarge number of applications do use the same port for all of theircommunication. So, yes, they are perfectly capable of allocating oneport (or a small number of ports), testing, and using that (those)port(s) for their communication. And having a draft that points outthe advantages of doing so is, by itself, useful. >> Section 4.2 - Can't really separate the topic from if UDP is blocked> from if the STUN server is down. The draft recommends multiple STUN servers for redundancy, but do wereally want to engage in a reduction to the absurd of "it's impossibleto diagnose network behavior because you can never differentiatebetween host failure vs network failure in the absence of responses"?True. But not interesting. >> Section 4.4 - this fails if the port was recently used for similar> tests from same stun server. There no way to know this as an> application. This type of test can work in lab condition where all> traffic on NAT is controlled but it operational networks it will fail. I believe this question is adequately addressed (and limitationsdiscussed) in Section 4.1 (of the current draft). That section wasposted to the mailing list prior to IETF73, and given to you directly,but I am not aware of receiving any comments from you reflecting itspresence. >> It is possible to do timing testing using just the change ip flag. The> REPSONSE-TARGET stuff is not needed and open up the possibility to> have a STUN server send packets to places that it should not which> causes IDS system to black list all traffic from the STUN server thus> making it unusable for other clients. The ability to tell the STUN> server to send packets to arbitrary locations would be fine for a> system in a lab used to characterize a NAT but is not a good idea for> internet deployed STUN servers. Please read the draft for the authentication and state required whenusing XOR-RESPONSE-TARGET. Your comments do not apply to the current(or recent) revisions of the draft. This issue has been extensivelydiscussed on the mailing list and in wg sessions, and resolved. Thisquestion was also addressed in my response to your previous commentsin August (see below). >> The bulk of these issues were sent Aug 28 to behave list during the> 2nd WGLC. I requested agenda time during IETF 74 to discuss these> issues but it was denied. I'm including at the bottom of this message a copy of the issuesraised Aug 28 with my responses to them. Those issues were addressedin the -05 revision in November. There has been no subsequent listdiscussion of those topics. >> In summary -The approaches described in this draft are known to fail> with many NATs. I don't see any evidence of the WG actually having> read this draft much less have consensus on the approach in it. I think the number of people providing comments both at the mic at thevarious sessions and on the mailing list argues against thisstatement. In reviewing these comments, I came across this statementreviewing whether the applicability statement addressed the concernsabout the draft after it was moved to experimental: ----------------------------------------------------------To: Behave WG <behave@xxxxxxxx>From: Cullen Jennings <fluffy@xxxxxxxxx>Date: Thu, 29 Nov 2007 22:13:09 -0800Subject: [BEHAVE] behave-nat-behavior-discovery I like the way you scope what this can and can not be used for. Itremoved a lot of my concerns about it. Cullen <with my individual hat on>---------------------------------------------------------- which makes me wonder what has changed since then? Bruce > I> think the WG should spend meeting time to discuss the topic and decide> what to do. The key topic in my mind is we are defining a document> that allows us to characterize a NAT in a lab or if we are trying to> make something that works in field and can be used to aid NAT> traversal in applications.>> Cullen <in my roll as individual contributor and ex chair of behave>>>>>>> On Mar 10, 2009, at 8:44 AM, The IESG wrote:>>> The IESG has received a request from the Behavior Engineering for>> Hindrance Avoidance WG (behave) to consider the following document:>>>> - 'NAT Behavior Discovery Using STUN '>> <draft-ietf-behave-nat-behavior-discovery-06.txt> as an Experimental>> RFC>>>> The IESG plans to make a decision in the next few weeks, and solicits>> final comments on this action. Please send substantive comments to>> the>> ietf@xxxxxxxx mailing lists by 2009-03-31. Exceptionally,>> comments may be sent to iesg@xxxxxxxx instead. In either case, please>> retain the beginning of the Subject line to allow automated sorting.>>>> The file can be obtained via>>> http://www.ietf.org/internet-drafts/draft-ietf-behave-nat-behavior-discovery-0> 6.txt>>>>>> IESG discussion can be tracked via>>> https://datatracker.ietf.org/public/pidtracker.cgi?command=view_id&dTag=15728&> rfc_flag=0>>>> The following IPR Declarations may be related to this I-D:>>>> https://datatracker.ietf.org/ipr/919/>> https://datatracker.ietf.org/ipr/945/>>>>>> _______________________________________________>> Behave mailing list>> Behave@xxxxxxxx>> https://www.ietf.org/mailman/listinfo/behave>> _______________________________________________> Ietf mailing list> Ietf@xxxxxxxx> https://www.ietf.org/mailman/listinfo/ietf>> _______________________________________________> Behave mailing list> Behave@xxxxxxxx> https://www.ietf.org/mailman/listinfo/behave> Below is the response to Cullen's email of Aug 28, which includesthose questions inline. The -05 version addressed these concerns,although frequently in different ways then described below because italso reflected updates in response to Magnus' AD review. On Tue, Sep 2, 2008 at 4:35 PM, Bruce Lowekamp <lowekamp@xxxxxxxxxxxxx> wrote:> Sorry, meant to respond to this over the weekend.>> I'm sure these won't be the only issues raised that need clarification.>> Bruce>>> Cullen Jennings wrote:>>>> Few comments>>>> Test 1: The first test defined in section 4.1 You have to have a good>> way to distinguish not UDP connectivity from the case where the STUN>> server is down or someone put in the wrong address.>>>> That text should probably be clarified to remind the reader that the> test applies only to connectivity to the particular STUN server. (and> that either the client or server could be misconfigured) In general,> though, that qualifier is at the beginning of the document and applies> to everything in it.>>>> Test 2: In test 4.2, I think it is important to identity that this test>> has to be done for every single port the application wants to use>> because we know that the results for different ports are often not the same>>>> Will add a note that in some situations behavior may vary port-by-port.> Actually, this should probably also be highlighted earlier in the> document.>>>> Test 3: This would be better if it mandated using a random source port>> and highlight that if any device had recently done test 2 on the same>> port, this test will fail to get the correct result and it fails in a>> way that suggests things will work that don't. It may sound odd to think>> one might get the same port but often when an embedded system reboots,>> it might run the same tests again at the same IP address and with ports>> like this.>>>> that's a good idea. Will clarify that in 3.2 and 4.3. Will also add> some text to point out the interaction between this and the previous issue.>>> Section 4.4 - given the rate limiting of NATs, I would give some advice>> that was more implementable than "care must be given". I'd specifically>> rate limit to something like no more than X stun packets per second. It>> would be nice to discuss here how long these tests can take even when>> they are done in parallel.>>>> Do you have a suggestion for the value of X? I don't think 4787> explicitly addresses this.>>>> Section 4.5 - the XOR-RESPONSE-TARGET just sort comes out of nowhere and>> is a bit hard to understand when reading the draft from front to back>>>> It was used earlier in 3.3, but I agree it's not well defined there,> either. Will try to make the introduction clearer.>>> The whole XOR-RESPONSE-TARGET has all the same security problems and>> issues as TURN. Instead of reinventing it all here, why not just use>> TURN to be able to send the packets to where you want them?>>>> I disagree that the same security issues are present (or at least in the> same magnitude). In particular, XOR-RESPONSE-TARGET is even more> limited in applicability in order to prevent it from being used for any> significant type of attack. In addition to the precautions already in> the current text, a previous revision required authentication for all> uses of XOR-RESPONSE-TARGET, but many people objected to this being too> strong compared to the potential threat this method offered, and instead> group consensus (almost unanimous as I recall) was for allowing the> current CACHE-TIMEOUT state/rate-limiting approach while allowing those> who desire to still require authentication. While you're right that> there is still a risk of a state attack on the server, the state> required to store is very small, is stored only on transactions that> request it, and the CACHE-TIMEOUT attribute provides feedback to the> client whether the request can complete. Furthermore, the consequences> of being unable to server new requests due to a DoS attack on the server> are not nearly as dire for behavior discovery as for TURN.>> Regarding TURN as a solution, that seems incredibly heavyweight for this> application, although I don't see a reason not to say that this test> could be implemented that way. You'd have to be careful, however, to> make sure that neither end of the TURN connection is running any> keepalives, which might be difficult since TURN specifies both STUN> keepalives and TURN keepalives for its connections.>>> In section 6.1 where you have "the server must verity that it has>> previously..." I think this must needs to be a MUST>>>> yes>>> I will note the RESPONSE-TARGET design forces the server to remember for>> some time some state about every binding request.>>>> From Section 5:>> If a client intends to utilize an XOR-RESPONSE-TARGET attribute in> future transactions, as described in Section 4.5, then it MUST> include a CACHE-TIMEOUT attribute in the Request with the value set> greater than the longest time duration it intends to test.>> so it only needs to store state for binding requests that included the> CACHE-TIMEOUT attribute.>>> Section 5.1 - the SRV service name needs to be in the IANA registry>>>> true>>>>> Cullen <as an individual contributor>>>_______________________________________________Ietf mailing listIetf@xxxxxxxxxxxxx://www.ietf.org/mailman/listinfo/ietf