Re: Last Call: draft-ietf-behave-nat-behavior-discovery (NATBehavior Discovery Using STUN) to Experimental RFC

Bruce Lowekamp <bbl@xxxxxxxxxxxx> · Sun, 5 Apr 2009 22:57:22 -0400

Bernard,

Thanks for the comments.  Let me see if I can describe a scenario in
which behavior-discovery is useful.

First, we don't want to "go back to 3489."  There were two problems
(well, there were a lot more problems, but I just want to talk about
two right now) in particular that we don't ever want to go back to:

- 3489 specified that an application would start up, characterize its
NAT, and work in that mode forever after
- 3489 specified that if you had a friendly NAT, you could query the
STUN server for your transport address and use that one address

At the same time, behavior-discovery is targeting applications for
which ICE doesn't necessarily make sense.  For example, applications
that don't want to fall back to TURN, but have other options for how
to establish a connection.   (whether this means indirect routing or
not needing the connection, or other reasons)

So let me try to go into more details on a potential P2P application.
When P2P node A starts up, it evaluates its NAT(s) relative to other
nodes already in the overlay.  Let's say that its testing indicates
it's behind a good NAT, with endpoint-independent mapping and
filtering.  In this case, the peer will join the overlay and establish
connections with appropriate peers in the overlay, but it will
advertise to any node in the overlay that wants to reach it that they
don't need to route through the overlay network formed by the P2P
nodes to reach it (which is the normal routing mode in a P2P overlay),
they can just send directly to its IP address.

So when node B wants to send a message to A, it sends the message
directly to A's IP address and starts a timer.  If it doesn't receive
a response within a certain amount of time, then it routes the message
to A across the overlay instead.  (Alternatively, B could
simultaneously send the message to A's IP address and across the
overlay, which guarantees minimum response latency, but can waste
bandwidth.)

A over time observes what percentage of the time it receives direct
messages compared to overlay messages. If the percentage of direct
connections is below some threshold (say 66%, picking a random number)
then may stop advertising for direct connections.  But if the
percentage is high enough, it continues to advertise because it may be
helping performance.  If at some point, the NAT changes its behavior,
A will notice a change in its direct connection percentage and may
re-evaluate its decision to advertise a public address.

(There are a lot of other details how this might work, how it would
deal with multiple levels of NATs, and what the actual cost benefits
are.  I don't want to get into all of the details of how it would work
here.)

This is a good example because behavior-discovery is used for initial
operating mode selection, but the actual decision for whether to
continue advertising that public IP/port pair is made based on actual
operating data.  It's also using the result of the behavior-discovery
work as an optimization, not in a manner where the application will
fail if a percentage of the nodes in the overlay are unable to make a
connection.

Bruce

On Sat, Apr 4, 2009 at 2:39 AM, Bernard Aboba <bernard_aboba@xxxxxxxxxxx> wrote:
> Bruce Lowekamp said:
>
> "Many of the questions you raise point to the same question of whether
> tests or techniques that are known to fail on a certain percentage of
> NATs under a certain percentage of operating conditions are
> nevertheless valuable.  behavior-discovery has an applicability
> statement
> http://tools.ietf.org/html/draft-ietf-behave-nat-behavior-discovery-06#section-1
> that discusses those issues in some detail.  I spent enough time
> wording that statement and discussing it with various people that I
> think it is best to refer to that statement.
>
> You also repeatedly uses phrases such as "basically won't work" and
> "it might work."   The comes down to the value of "certain percentage"
> as used above.  My experience with these techniques, and the
> experience of those who have used such techniques recently, is that
> they are far more reliable than that, into the 90% range, particularly
> when used correctly.  That is not high enough that we could go back to
> 3489---all techniques require fallbacks because they fail, and 90% is
> far, far too low of a success rate---but it is high enough that
> applications can make useful decisions based on that information,
> provided they have a fallback in cases where the information is wrong.
> And those are the conditions of the experiment."
>
> What I am failing to understand is the distinction between those
> situations in which we "cannot go back to RFC 3489" and the scenarios
> envisaged for the experiment.
>
> Presumably, situations in which we "cannot go back to RFC 3489"
> include Internet telephony, which may be used for life-critical
> situations such as E911.  For those kind of scenarios, we need
> traversal technologies that are as reliable as possible, and are
> willing to live with the complexity of ICE to achieve this.
>
> The draft mentions P2P applications as one potential situation in
> which usage of imperfect techniques is acceptable, and yet the
> IETF currently has the P2PSIP WG, which is involved in the
> development of technology for usage of SIP over P2P networks.
> In that kind of application, wouldn't the reliability requirements
> be similar to those in which we "cannot go back to RFC 3489"?
>
> This lead me to think about the requirements for the diagnostic
> scenarios that are also discussed in the document.  In existing
> deployments it is often challenging to figure out the reasons
> why traversal is unsuccessful, and what can be done to improve
> the overall success rate.  Data suggests that there are even
> common situations in which ICE will fail.  But in thinking
> through how to approach diagnosis under those conditions,
> I'd currently be more inclined to start from the addition of
> diagnostics to an ICE implementation than to focus on the
> use of the diagnostic mechanisms described in the draft.
>
> So while I'm generally sympathetic to the idea that there
> are situations in which "less than perfect" techniques can
> be useful, in practice a number of common situations
> where NAT traversal is used today (such as life-critical
> Internet telephony) do not seem to fit into that bucket.
>
> It could be that I didn't quite understand the examples
> given in the applicability statement, or that I'm putting
> too much emphasis on corner conditions, because that is
> what customers tend to complain about.
>
> However, overall the document left me unclear about the
> rationale by which the material deprecated in RFC 3489
> was being re-introduced.   While it does seem possible
> to construct a rationale for this, the document doesn't
> provide enough background to get me over that hump.
>
>
>
>
>
>
> _______________________________________________
> Ietf mailing list
> Ietf@xxxxxxxx
> https://www.ietf.org/mailman/listinfo/ietf
>
>
_______________________________________________

Ietf@xxxxxxxx
https://www.ietf.org/mailman/listinfo/ietf