I have reviewed this document as part of the security directorate's
ongoing effort to review all IETF documents being processed by the
IESG. These comments were written primarily for the benefit of the
security area directors. Document editors and WG chairs should treat
these comments just like any other last call comments.
This document allows SIP endpoints to negotiate the use of a
circuit-switched (e.g. PSTN) channel. It presents a mechanism for
correlating an incoming circuit-switched connection with a given
SDP/SIP session by sending a nonce or a static string.
Summary: I would like to see a stronger authentication mechanism
defined to replace the static string or "plaintext password" nonce.
I am content with the analysis of security weaknesses: an attacker
could trick someone into initiating a potentially expensive PSTN call,
and the correlation mechanism is weak.
I am not content with the use of a mere nonce or static string for
correlation. That is the equivalent of sending plaintext passwords,
and I suspect we have better mechanisms available that could allow for
mutual endpoint authentication, making it statistically unlikely for a
man-in-the-middle to participate successfully in the correlation
exchange. The document makes a case for using short strings/nonces
(e.g. a caller-ID string or 10 DTFM digits). I suspect both that
those lengths could be extended without great pain and that some
stronger authentication mechanisms could work with such short strings,
especially given the ability to send longer keying material in the
packet-switched SDP session.
Non-security observation: I'm worried that the architecture of the
current correlation mechanism will have some unintended consequences.
From section 5.2.3:
The endpoints should be able to correlate the circuit-switched bearer
with the session negotiated with SDP in order to avoid ringing for an
incoming circuit-switched bearer that is related to the session
controlled with SDP (and SIP).
As I understand it, some of the defined variants of the correlation
scheme require answering the call (e.g. the DTMF scheme) before
knowing whether or not the channel corresponds to a SIP session. If
it does not, then what? The call is already answered, which gives a
surprising user experience. Feel free to tell me I'm off base with
this one.
-- Sam