Bernard, This text was updated and included in the -07 draft. Thanks for the comments. Bruce On Mon, Apr 6, 2009 at 12:10 PM, Bernard Aboba<bernard_aboba@xxxxxxxxxxx> wrote:> Bruce -->> Thanks for the reply. Your explanation provides some helpful background.> Would you consider adding some of this material to the document?>>> Date: Sun, 5 Apr 2009 22:57:22 -0400>> Subject: Re: Last Call: draft-ietf-behave-nat-behavior-discovery>> (NATBehavior Discovery Using STUN) to Experimental RFC>> From: bbl@xxxxxxxxxxxx>> To: bernard_aboba@xxxxxxxxxxx>> CC: ietf@xxxxxxxx; behave@xxxxxxxx>>>> Bernard,>>>> Thanks for the comments. Let me see if I can describe a scenario in>> which behavior-discovery is useful.>>>> First, we don't want to "go back to 3489." There were two problems>> (well, there were a lot more problems, but I just want to talk about>> two right now) in particular that we don't ever want to go back to:>>>> - 3489 specified that an application would start up, characterize its>> NAT, and work in that mode forever after>> - 3489 specified that if you had a friendly NAT, you could query the>> STUN server for your transport address and use that one address>>>> At the same time, behavior-discovery is targeting applications for>> which ICE doesn't necessarily make sense. For example, applications>> that don't want to fall back to TURN, but have other options for how>> to establish a connection. (whether this means indirect routing or>> not needing the connection, or other reasons)>>>> So let me try to go into more details on a potential P2P application.>> When P2P node A starts up, it evaluates its NAT(s) relative to other>> nodes already in the overlay. Let's say that its testing indicates>> it's behind a good NAT, with endpoint-independent mapping and>> filtering. In this case, the peer will join the overlay and establish>> connections with appropriate peers in the overlay, but it will>> advertise to any node in the overlay that wants to reach it that they>> don't need to route through the overlay network formed by the P2P>> nodes to reach it (which is the normal routing mode in a P2P overlay),>> they can just send directly to its IP address.>>>> So when node B wants to send a message to A, it sends the message>> directly to A's IP address and starts a timer. If it doesn't receive>> a response within a certain amount of time, then it routes the message>> to A across the overlay instead. (Alternatively, B could>> simultaneously send the message to A's IP address and across the>> overlay, which guarantees minimum response latency, but can waste>> bandwidth.)>>>> A over time observes what percentage of the time it receives direct>> messages compared to overlay messages. If the percentage of direct>> connections is below some threshold (say 66%, picking a random number)>> then may stop advertising for direct connections. But if the>> percentage is high enough, it continues to advertise because it may be>> helping performance. If at some point, the NAT changes its behavior,>> A will notice a change in its direct connection percentage and may>> re-evaluate its decision to advertise a public address.>>>>>> (There are a lot of other details how this might work, how it would>> deal with multiple levels of NATs, and what the actual cost benefits>> are. I don't want to get into all of the details of how it would work>> here.)>>>> This is a good example because behavior-discovery is used for initial>> operating mode selection, but the actual decision for whether to>> continue advertising that public IP/port pair is made based on actual>> operating data. It's also using the result of the behavior-discovery>> work as an optimization, not in a manner where the application will>> fail if a percentage of the nodes in the overlay are unable to make a>> connection.>>>> Bruce>>>>>> On Sat, Apr 4, 2009 at 2:39 AM, Bernard Aboba <bernard_aboba@xxxxxxxxxxx>>> wrote:>> > Bruce Lowekamp said:>> >>> > "Many of the questions you raise point to the same question of whether>> > tests or techniques that are known to fail on a certain percentage of>> > NATs under a certain percentage of operating conditions are>> > nevertheless valuable. behavior-discovery has an applicability>> > statement>> >>> > http://tools.ietf.org/html/draft-ietf-behave-nat-behavior-discovery-06#section-1>> > that discusses those issues in some detail. I spent enough time>> > wording that statement and discussing it with various people that I>> > think it is best to refer to that statement.>> >>> > You also repeatedly uses phrases such as "basically won't work" and>> > "it might work." The comes down to the value of "certain percentage">> > as used above. My experience with these techniques, and the>> > experience of those who have used such techniques recently, is that>> > they are far more reliable than that, into the 90% range, particularly>> > when used correctly. That is not high enough that we could go back to>> > 3489---all techniques require fallbacks because they fail, and 90% is>> > far, far too low of a success rate---but it is high enough that>> > applications can make useful decisions based on that information,>> > provided they have a fallback in cases where the information is wrong.>> > And those are the conditions of the experiment.">> >>> > What I am failing to understand is the distinction between those>> > situations in which we "cannot go back to RFC 3489" and the scenarios>> > envisaged for the experiment.>> >>> > Presumably, situations in which we "cannot go back to RFC 3489">> > include Internet telephony, which may be used for life-critical>> > situations such as E911. For those kind of scenarios, we need>> > traversal technologies that are as reliable as possible, and are>> > willing to live with the complexity of ICE to achieve this.>> >>> > The draft mentions P2P applications as one potential situation in>> > which usage of imperfect techniques is acceptable, and yet the>> > IETF currently has the P2PSIP WG, which is involved in the>> > development of technology for usage of SIP over P2P networks.>> > In that kind of application, wouldn't the reliability requirements>> > be similar to those in which we "cannot go back to RFC 3489"?>> >>> > This lead me to think about the requirements for the diagnostic>> > scenarios that are also discussed in the document. In existing>> > deployments it is often challenging to figure out the reasons>> > why traversal is unsuccessful, and what can be done to improve>> > the overall success rate. Data suggests that there are even>> > common situations in which ICE will fail. But in thinking>> > through how to approach diagnosis under those conditions,>> > I'd currently be more inclined to start from the addition of>> > diagnostics to an ICE implementation than to focus on the>> > use of the diagnostic mechanisms described in the draft.>> >>> > So while I'm generally sympathetic to the idea that there>> > are situations in which "less than perfect" techniques can>> > be useful, in practice a number of common situations>> > where NAT traversal is used today (such as life-critical>> > Internet telephony) do not seem to fit into that bucket.>> >>> > It could be that I didn't quite understand the examples>> > given in the applicability statement, or that I'm putting>> > too much emphasis on corner conditions, because that is>> > what customers tend to complain about.>> >>> > However, overall the document left me unclear about the>> > rationale by which the material deprecated in RFC 3489>> > was being re-introduced. While it does seem possible>> > to construct a rationale for this, the document doesn't>> > provide enough background to get me over that hump.>> >>> >>> >>> >>> >>> >>> > _______________________________________________>> > Ietf mailing list>> > Ietf@xxxxxxxx>> > https://www.ietf.org/mailman/listinfo/ietf>> >>> >>_______________________________________________Ietf mailing listIetf@xxxxxxxxxxxxx://www.ietf.org/mailman/listinfo/ietf