Re: Tsvart last call review of draft-ietf-ice-rfc5245bis-16

Magnus Westerlund <magnus.westerlund@xxxxxxxxxxxx> · Thu, 1 Feb 2018 11:48:34 +0100

Hi,

Please see my comments inline.

Den 2018-01-30 kl. 15:34, skrev Christer Holmberg:
Hi,

Please see inline. As requested, I include my replies to the comments you
already provided earlier.

Significant Issues:

A. Section 5.2:

    Lite implementations only utilize host candidates.  A lite
    implementation MUST, for each component of each data stream, allocate
    zero or one IPv4 candidates.  It MAY allocate zero or more IPv6
    candidates, but no more than one per each IPv6 address utilized by
    the host.  Since there can be no more than one IPv4 candidate per
    component of each data stream, if an ICE agent has multiple IPv4
    addresses, it MUST choose one for allocating the candidate.  If a
    host is dual-stack, it is RECOMMENDED that it allocate one IPv4
    candidate and one global IPv6 address.  With the lite implementation,
    ICE cannot be used to dynamically choose amongst candidates.
    Therefore, including more than one candidate from a particular scope
    is NOT RECOMMENDED, since only a connectivity check can truly
    determine whether to use one address or the other.

I find it quite strange that the above text says there can only be single
IPv4
based candidate, while for IPv6 a LITE implementation may have one
candidate
per IPv6 address. Isn't the LITE implication of having multiple
candidates for
the same address family similar? Yes, IPv6 kind of forces the need for
dealing
with multiple IPv6 addresses on any host. However, I can see that certain
servers will actually be multi-homed in IPv4 and thus can in a sensible
way
actually have multiple IPv4 candidates, and let the clients select which
interface has the best reachability.

Can you please be explicit on what in ICE prevents things to work for
IPv4 but
the same case works for IPv6?
This is text from RFC 5245. I agree it is confusing, and unfortunately I
don’t have a good answer.

I guess my approach would be to suggest that we simply remove the
restriction. In addition, there is generic text about dual-stack etc
elsewhere,
and I don’t see anything ICE lite specific.

OLD:

"Lite implementations only utilize host candidates.  A lite
    implementation MUST, for each component of each data stream, allocate
    zero or one IPv4 candidates.  It MAY allocate zero or more IPv6
candidates, but no more than one per each IPv6 address utilized by
    the host.  Since there can be no more than one IPv4 candidate per
    component of each data stream, if an ICE agent has multiple IPv4
    addresses, it MUST choose one for allocating the candidate.  If a
    host is dual-stack, it is RECOMMENDED that it allocate one IPv4
    candidate and one global IPv6 address.  With the lite implementation,
    ICE cannot be used to dynamically choose amongst candidates.
    Therefore, including more than one candidate from a particular scope
    is NOT RECOMMENDED, since only a connectivity check can truly
    determine whether to use one address or the other."

NEW:

"Lite implementations only utilize host candidates.
With the lite implementation, ICE cannot be used to dynamically choose
amongst candidates. Therefore, including more than one candidate from a
particular IP address family is NOT RECOMMENDED, since only a connectivity
check can Truly determine whether to use one address or the other."

I have read Ben's comment on this issue. I think something needs to be 
done to avoid the confusion. There are I think two alternatives,
either retain the restriction with an added note on why this restriction 
exists. The second I think is to go with an amended version of the text 
proposal, possibly with even clearer wording that if you are running 
multiple addresses in the same family, you should really should be using 
a full implementation.

I noted that the "NEW" text above needs an amendment, and that is 
because I think a very important part of what was cut needs to be retained.

It MAY allocate zero or more IPv6
candidates, but no more than one per each IPv6 address utilized by
   the host.

If one generalize this what is missing is the limitation that only a 
single candidate per IP address utilized by the host can be added by a 
lite implementation. Using multiple would be redundant, but lets use a 
direct statement to this affect, rather than a subordinate clause to  a 
MAY statement.

---

B. Section 6.1.1:

    An agent MUST be prepared that the peer might re-determine the roles
    as part of any ICE restart, even if the criteria for doing so are not
    fulfilled.  This can happen if the peer is compliant with an older
    version of this specification.

What does it mean to be prepared for a peer that re-determine the roles?
What
is it one MUST do? If the peer changes its role upon an ICE restart,
isn't that
going to result in a role mismatch? Thus causing yet another ICE restart,
where
also this peer will re-evalute? Isn't that good enough? Or is it
something else
it can do?
The roles are re-negotiated during the ICE restart: it may, or may not,
result in a role mismatch.

"Prepared" means that the peer might change its role even though it does
not fulfil the 5245bis criteria for being allowed to do so.

Yes, but prepared is not actionable. I guess what you mean is that an 
implementation MUST accept and perform re-determination of the role 
initiated by the peer agent, even if the criteria are not fulfilled.

---

C. Section 6.1.3:

    The ICE agent has a state determined by the state of the check lists.
    The state is Completed if all check lists are Completed, Failed if
    all check lists are Failed, and Running otherwise.

Does failed really require all the checklists to fail, or simply any to
fail if
the others are completed?
If there are one or more completed, the session can still continue. As
described in section 8.1.2:

"If at least one of the check lists for other data streams is
     Completed, the controlling agent SHOULD remove the failed data
     stream from the session while sending updated candidate list to
     its peer."

---

Okay, so the statement above is actually fulfilled, by pruning the 
failed checklists through signalling, so that all lists are completed.

I reacted, because it looked like you can actually arrive at a result 
where some checklists are completed and some are failed. And if I 
interpret this it should always be considered as running, as more can be 
done before things are completed.

I guess there is no real need for additional text here. Just a sense 
from me as a reader that statements looks wrong, even if it isn't.

D. Section 6.1.4.2:

I don't know if I misunderstand the algorithm here in the bullet list. To
me it
appears that it will terminate prior to have initiated all possible
tests, as
it appears that it will not unfreeze some of the candidate pairs. If one
have
tests running for a foundation, but all other candidate checks have been
started, then the steps are aborted. Is the bullet list rechecked every
Ta?
No. Whenever Ta fires, one check list in Running state is checked. When
all check lists have been checked, it will start over from the top of the
list.

Or, did I misunderstand your issue?

No, It was unclear that this is re-run every time Ta fires. I think the 
reason for this is this sentence in step 4.

If this happens for every single check list in the
       Running state, meaning there are no remaining candidate pairs to
       perform connectivity checks for, abort these steps.

It was unclear to me that this doesn't mean aborting the whole 
procedure. Because if the requirement is fulfilled then there is nothing 
to do until the next Ta, which will possible unfreeze another candidate 
pair in step 2. I would recommend that you reformulate this sentence to 
make it clear that the processing for this Ta is stopped, and processing 
will be resumed the next time Ta fires.

---

E. Section 7.2.5.2.2.  ICMP Error

    An ICE agent MAY support processing of ICMP errors for connectivity
    checks.  If the agent supports processing of ICMP errors, and if a
    Binging request generates an ICMP error, the agent SHOULD set the
    state of the candidate pair to Failed.

I am a bit worried by this blanket statement on ICMP errors. I think it
should
be clarified which ICMP message types that are relevant to consider as
errors?
I assume Type 3 (Destination Unreachable) but maybe not all responde
codes as
Codes 4, 11,12 may be addressable in other ways, and likely Type 11 (Time
exceeded) with response code 0, response code 1 is not a clear indication
of a
non working path.
This is from RFC 5245.

I don’t think the ICE WG should go through all different codes and
combinations, and determine what should be considered an error, and what
not.

If you can provide something (table, guidance etc), we are happy to
include it. Otherwise I’d like to keep it as it is, and let
implementations deal with it, as at least I am not aware that this would
Have caused issues in ICE deployments.

I think we there is a point to clarify that this applies to ICMP 
messages indicating a non-usable path. So maybe it could be rewritten to 
something like this:

   An ICE agent MAY support processing of ICMP messaging indicating a non-functioning path for connectivity
   checks. ICMP messages of type 3 (Destination Unreachable) are indicators of a currently non-functioning path. However, the issue can be temporary as it can depend on routing transients, this for example applies to codes 0, 1 and 5. Other messages that appear to indicate non-functioning path such as Type=11 (Time Exceeded) with code=0 (Time to Live exceeded in Transit) are not clear indicator as the IP packets potentially can reach the destination with a larger TTL value set at transmission. Therefore, implementation needs to analyse the different ICMP messages types and codes for which it considers the path as non-functioning. If the agent supports processing of ICMP errors, and if a
   Binging request generates an ICMP error, the agent SHOULD set the
   state of the candidate pair to Failed.

What also is not addressed in this is the risk of denial of service 
attacks using spoofed ICMP messages to shutdown certain connectivity 
checks. The security considerations lack any discussion of this issue. 
If ICMP processing are retained, I think a recommendation about 
validation is needed to avoid at least off path attackers from doing 
these attacks easily. Unfortunately ICMP response will only include the 
IP/UDP header, thus no data from the STUN messages which would allow 
verification that the ICMP messages matches an actually sent check.

It may be simplest to recommend against reacting to ICMP errors from 
both the perspective that it is a risk for denial of service attack, as 
well as that it represents a risk terminating connectivity checks for a 
transient issue. From my perspective I expect this to reduce the number 
of sent connectivity checks very little

---

F. Section 7.2.5.2.3.  Timeout

    If the Binding request times out, the ICE agent MUST set the
    candidate pair state to Failed.

Isn't this erroneous? Timeout for the connectivity check is happening
when all
the (re-)transmissions have timed out, isn't it? or as simple as missing
the
word "transaction"?
Correct. I will change to "Binding request transaction"

Good!

---

G. Section 14.3:

     Num-Of-Pairs: the number of pairs of candidates
     with STUN or TURN servers.

I don't understand this definition. What does "with STUN or TURN servers"
mean?

Candidate pairs where the local side is server-reflexive or relay?
The text comes from RFC 5245, but I think it’s wrong. There are no pairs
during the gathering phase.

My suggestion is to change Num-Of-Pairs to Num-Of-Cands, and say:

"Num-Of-Cands: the number of server-reflexive and relay candidates"

Sounds good to me.

---

H. Section 14.3:

     Num-Waiting: the number of checks in the check list in the
     Waiting state.

     Num-In-Progress: the number of checks in the In-Progress state.

Is "the number of checks" only per single checklist or across all the
check
lists?
Per single check list.

Section 14.1 says: "Those formulas scale
   with N, the number of checks to be performed."

But, from my perspective, that number N is not on a single checklist, 
but for the whole ICE session, i.e. all the checklists. Thus, I think 
there needs a clarification on why this is per checklist, making it 
clear why these two things match up.

---

I. Section 17.2.3:

When VAD is being
   used, keepalives will be sent during silence periods.

I would claim that this is only true for when VAD without any comfort
noise is
used. A lot of codecs with VAD operations still generates comfort noise
on a
frequency of a couple packets a second, way more often then the minimal
for ICE
keep-alives.
What about:

"In deployments that are not utilizing Voice Activity Detection (VAD),
without any comfort noise,
the keepalives are never..."

That is also not true. As keep-alives is not expected to be needed for:
1. Continous media withouth any VAD etc.
2. Media with VAD, but with comfort noise not allowing intervals to be 
to long (<=1 second).

=======

Minor/Editorial Issues:

1.  Section 5.1.2:

This section doesn't make it clear that higher priority values are more
prioritized over lower values. That really should be defined here. Now
that
information only becomes evident implicitly in section 5.1.2.1.

I suggest the following:

"This priority will be used by ICE to determine the order of the
    connectivity checks and the relative preference for candidates.
Higher priority values give more priority over lower values."

Good with me.

---

2. Section 2.1.

    In order to execute ICE, an ICE agent has to identify all of its
    address candidates.

I think this sentence is raising a too high requirement. An ICE agent has
attempt to identify as many of the address candidates as possible. The
better
coverage of the potential candidates the more likely it is to function. I
would
also argue that there are multiple cases where you will not figure out
that
there are candidates that you don't know about. An obvious example is in
cases
of two NATs between the local address realm and the STUN server, the agent
can't figure out that there was an address given to the flow in the middle
address realm between the two NATs. That can be only learn if the agent
has a
STUN server in that address realm. Secondly, there are cases where policy
may
be applied to exclude certain interfaces and their related candidates.
I suggest to replace with first sentence and simply say:

"In order to execute ICE, an ICE agent identifies and gathers one or more
address candidates."
Yes, works fine in this introduction context.

I also noted that this first paragraph and the second has a strange
relation.
The first part of first paragraph is general, then there is the part of
the
host candidate. Then the second part starts with STUN and TURN derived
candidates. Maybe the first paragraph should be split between the general
and
the host part, or some other bridging is needed.
I suggest to split the first paragraph into two paragraphs, where the
second paragraph begins with the "At least one viable candidate뒰�
sentence.

Looks like it should work.

---

3. Section 2.3:

If the transactions above succeed, the agents will set
    the nominated flag for the pairs, and will cancel any future checks
    for that component of the data stream.

Although what is stated is normal, it is not guaranteed to happen, I know
this
is intended as a simplified overview, but ignoring that there can occur
some
shuffling back and forth if high priority checks complete after low
priority
ones should at least be hinted or at least allowed by the use of words.
One of the tasks have been to simplify section 2, because it was too
detailed for people who just wanted to get an overview of ICE. For
example, we removed the text about role conflicts.

So, I would prefer to not cover your case in section 2. I don¹t think it
is essential for people who want to get an overview. Implementers
obviously will need to read the whole spec.

Okay, I understand the point.

---

4. Section 3.

As RFC 7825 do describe a significant enough different usage of ICE from
SIP, I
think it would be good to actually included an informational reference to
this
usage.
I can do that. However, note that RFC 7825 references RFC 5245. It even
references specific sections, which may not be the same in 5245bis.

In addition, RFC 7825 uses ³aggressive nomination² terminology, which has
been removed from 5245bis.

What about:

"RFC 7825 defines an ICE usage for the Real-Time Streaming Protocol
(RTSP). Note, however, that the ICE usage is based on RFC 5245."

Yes, I think that is very fair statement. But there is a point of 
actually indicating that there are different usages therefore I think it 
make sense to include that sentence.

---

5. Section 5.1.1.4:

An ICE agent SHOULD
    monitor the interfaces it uses, invalidate candidates whose base has
    gone away, and acquire new candidates as appropriate when new
    interfaces appear.

I am missing discussion of new addresses here. If the base disappears, it
might
be that there is a new IP address that one should use. That doesn't
necessary
imply a new interface.
What about:

"Host candidates do not time out, but the candidate addresses may
change or disappear for a number of reasons. An ICE agent SHOULD
monitor the interfaces it uses, invalidate candidates whose base has
gone away, and acquire new candidates as appropriate when new
<new>IP addresses (on new or currently used interfaces)</new> appear."

Looks good to me.

---

6. Section 7.2.5.1:

    If the Binding request generates a 487 (Role Conflict) error
    response, and if the ICE agent included an ICE-CONTROLLED attribute
    in the request, the agent MUST switch to the controlling role. If
    the agent included an ICE-CONTROLLING attribute in the request, the
    agent MUST switch to the controlled role.

I think the first sentence should have a forward reference to Section
7.3.1.1
where the rest of the solution is described.
I will add a reference.

Ok

---

7. Section 7.3:

If the agent is using Diffserv Codepoint markings [RFC2475] in its
    data packets, it SHOULD apply the same markings to Binding responses.

I find this sentence a bit unclear. Is it intended to say:

If the agent receiving the binding request, intended to use DSCP markings
!=0
for the data, it SHOULD set, the same marking to binding responses.

or

If the agent receives a binding request with DSCP markings, then it should
apply to corresponding code point when forming the binding response?
It means that it will use the same markings in Binding responses that it
uses in data packets (audio, video, etc).

Okay, I wonder if it is the temporal aspect of the sentence that makes 
it harder for me to parse correctly.

There are unclarity of which agent is referenced and whom "it" is in the
sentence.

It is the STUN server.

Would the following be more clear?

"If the agent is using Diffserv Codepoint markings [RFC2475] in data
packets that it sends, the agent SHOULD apply the same markings to Binding
responses."

Yes, except that I stumbles "it sends" where it is more likely "it will send" is the correct form.

8. Section 8.3.1:

   The procedures in Section 8 require that an ICE agent continue to
   listen for STUN requests and continue to generate triggered checks
   for a data stream, even once processing for that stream completes.

That reference to Section 8, should that in fact be to Section 8.1
specifically? It looks strange with a self reference, which in some
aspect a
reference to section 8 means.

I think this issue was missed.

---

9. Section 15:

4.57566E+18 (note that
   an implementation would represent this as a 64-bit integer so as not
   to lose precision).

Why the floating point representation? Priorities are integer numbers and
thus
should be presented as such in this example.
This is from RFC 5245, and unfortunately I don’t know.

Can you not just calculate the 64-bit integer and write it out?

---

10. Section 17.2.1:

   First and foremost, ICE makes use of TURN and STUN servers, which
   would typically be located in the network operator's data centers.

Is really network operator's data centers the right entity here? I would
claim
it is the service operators data centers, they may contract STUN services
from
network operators, but if the service operator isn't providing any STUN
service
for its own service, then ICE is unlikely to work.
Could we simply say “located in data centers”?

Yes.

---

11. Section 18.2:

Local IPv6 addresses can be preferred.

I think this sentence needs to clarify that it means local to host,
rather than
any form of relayed or translated address, rather than a local scope only
IPv6
address.
I could say:

"Local IPv6 host addresses"

Maybe better:

IPv6 address from the local host can be preferred.

---

12. Section 18.5:

   A number of NAT boxes are now being deployed into the market that try
   to provide "generic" ALG functionality.  These generic ALGs hunt for
   IP addresses, either in text or binary form within a packet, and
   rewrite them if they match a binding.

Are actually these generic ALG functionality relevant today? They proved
to be
a very bad idea very quickly. A note that this was a consideration at the
time
RFC 5245 was published.
I don’t know whether the functionality is relevant. But, I guess it
doesn’t hurt to have the text.

Fine, I guess it does no real harm.

Cheers

Magnus Westerlund

----------------------------------------------------------------------
Media Technologies, Ericsson Research
----------------------------------------------------------------------
Ericsson AB                 | Phone  +46 10 7148287
Torshamnsgatan 23           | Mobile +46 73 0949079
SE-164 80 Stockholm, Sweden | mailto: magnus.westerlund@xxxxxxxxxxxx
----------------------------------------------------------------------

<<attachment: smime.p7s>>