Reviewer: Dale Worley
Review result: Ready with Nits
I am the assigned Gen-ART reviewer for this draft. The General Area
Review Team (Gen-ART) reviews all IETF documents being processed
by the IESG for the IETF Chair. Please treat these comments just
like any other last call comments.
For more information, please see the FAQ at
<
http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.
Document: draft-ietf-slim-negotiating-human-language-06
Reviewer: Dale R. Worley
Review Date: 2017-02-17
IETF LC End Date: 2017-02-20
IESG Telechat date: [unknown]
Summary:
This draft is basically ready for publication, but has nits
that should be fixed before publication.
* Technical comments
A. Call failure
If a call fails due to no available language match, in what way(s)
does it fail? Section 5.3 says
If such an offer is received, the receiver MAY
reject the media, ignore the language specified, or attempt to
interpret the intent
But I suspect it's also allowed for the UAS to fail the call at the
SIP level. Whether or not that is allowed (or at least envisioned)
should be described. And what response code(s)/warn-code(s) should
be
used for that?
B. Audio/Video coordination
5.2. New 'humintlang-send' and 'humintlang-recv' attributes
Note that while signed language tags are used with a video stream
to
indicate sign language, a spoken language tag for a video stream
in
parallel with an audio stream with the same spoken language tag
indicates a request for a supplemental video stream to see the
speaker.
And there's a similar paragraph in 5.4:
A spoken language tag for a video stream in conjunction with an
audio
stream with the same language might indicate a request for
supplemental video to see the speaker.
I think this mechanism needs to be described more exactly, and in
particular, it should not depend on the UA understanding which
language tags are spoken language tags. It seems to me that a
workable rule is that there is an audio stream and a video stream and
they specify exactly the same language tag in their respective
humintlang attributes. In that case, it is a request for a spoken
language with simultaneous video of the speaker, and those requests
should be considered satisfied only if both streams can be
established.
* The following three items are adjustments to the design which I'd
like to know have been considered.
C. "humintlang" seems long to me
Given the excessive length of SDP in practice, it seems to me that a
shorter attribute name would be desirable. E.g., "humlang" as was
used in some previous versions. Or is there a coordinated usage with
other names in the "hum*lang" pattern?
D. Use the Accept-Language syntax
It seems to me that it would better to use the Accept-Language syntax
for the attribute values. This allows (1) specifiying the quality of
language experience, allowing clear description of bilingualism, (2)
a
unified method of specifying whether or not arbitrary languages are
acceptable, and (3) abbreviating SDP descriptions.
In a way, the fact that the current proposal seems to require (but
does not directly specify) the coordinated absence/presence of an
asterisk on all of the repetitions of humintlang-send or
humintlang-recv is a warning that the syntax doesn't represent the
semantics as well as it might.
E. Have an attribute to abbreviate the bidirectionally-symmetric case
Note that all examples are bidirectionally symmetric, and the text
says that requests and responses SHOULD be bidirectionally symmetric.
So it would be a very useful abbreviation to define
"humintlang=<value>" to be equivalent to the combination of
"humintlang-send=<value>" and "humintlang-recv=<value>".
Combining proposals C, D, and E, the examples become
m=audio 49170 RTP/AVP 0
a=humlang:en
m=video 51372 RTP/AVP 31 32
a=humlang:ase,*;q=0.1
m=audio 49250 RTP/AVP 20
a=humlang:es,eu;q=0.9,en;q=0.8,*;q=0.1
m=text 45020 RTP/AVP 103 104
a=humlang:gr
which requires about half as many characters as they have now.
* Editorial comments and nits
Abstract
This document describes the need and a solution using new SDP
stream
attributes.
I don't think the term "stream attribute" is used in RFC 4566.
Instead, it uses "media attribute".
1. Introduction
caller and callee know each other or there is contextual or out of
band information from which the language(s) and media modalities
can
I think this context, it's preferred to hyphenate "out-of-band" to
make it clearly be an adjective.
This approach has a number of benefits, including that it is
generic
(applies to all interactive communications negotiated using SDP)
and
not limited to emergency calls.
I think s/and not limited to/and is not limited to/ reads more
smoothly.
But it is clearly useful in many other cases. For
example, someone calling a company call center or a Public Safety
Answering Point (PSAP) should be able to indicate if one or more
specific signed, written, and/or spoken languages are preferred,
the
callee should be able to indicate its capabilities in this area,
and
the call proceed using in-common language(s) and media forms.
I think s/preferred, the callee/preferred; the callee/ because the
sentence is the concatenation of two sentences.
Perhaps s/in-common/shared/.
Including the user's human (natural) language preferences in the
session establishment negotiation is independent of the use of a
relay service and is transparent to a voice service provider.
I think it's even broader than "transparent to a voice service
provider" -- it's transparent to any serivice provider, assuming that
the media are language-neutral.
In the case of a call to e.g., an airline, the call could be
automatically handled by a Spanish-speaking agent.
I think s/handled by/routed to/ is the usual usage.
3. Desired Semantics
The desired solution is a media attribute (preferably per
direction)
that may be used within an offer to indicate the preferred
language
of each (direction of a) media stream, and within an answer to
indicate the accepted language.
In this one instance, I think you want to use "language(s)" to drive
home that that multiple languages can be specified: "within an offer
to indicate the preferred language(s)".
(Negotiating multiple simultaneous languages within a media stream
is
out of scope, as the complexity of doing so outweighs the
usefulness.)
You might want to say instead "(Negotiating multiple simultaneous
languages within a media stream is out of scope for this document.)"
to ensure that nobody decides to argue whether "the complexity of
doing so outweighs the usefulness".
4. The existing 'lang' attribute
RFC 4566 [RFC4566] specifies an attribute 'lang' which appears
similar to what is needed here, but is not sufficiently detailed
for
use here.
"for use here" isn't quite right. Maybe "is not sufficiently
specific
or flexible to satisfy the requirements".
In addition, it is not mentioned in [RFC3264]
"it" is somewhat ambiguous here, perhaps change to "the 'lang'
attribute".
5. Proposed Solution
Perhaps /Proposed Solution/Solution/, since once this draft is
approved, it becomes the solution.
5.2. New 'humintlang-send' and 'humintlang-recv' attributes
a=humintlang-send:<language tag>
a=humintlang-recv:<language tag>
This is presented as the generic form of the attributes, but there is
no indication of the posible asterisk.
The values constitute a list of languages
in preference order (first is most preferred).
"The values" isn't very clear, because the values are in successive
attributes. You want to say something like "The sequence of values
in
the occurrences of one of these attributes constitutes ...".
However,
see the technical comments above.
When placing an emergency call, and in any other case where the
language cannot be assumed from context, each media stream in an
offer primarily intended for human language communication SHOULD
specify both (or in some cases, one of) the 'humintlang-send' and
'humintlang-recv' attributes.
Probably s/assumed/inferred/.
Could you be more accurate by
s/or in some cases/or for unidirectional streams/?
5.3. Advisory vs Required
The mechanism for indicating this preference is that, in an offer,
if
the last character of any of the 'humintlang-recv' or 'humintlang-
send' values is an asterisk, this indicates a request to not fail
the
call (similar to SIP Accept-Language syntax). Either way, the
called
party MAY ignore this, e.g., for the emergency services use case,
a
PSAP will likely not fail the call.
The construction of this paragraph isn't quite complete. It says
that
if an asterisk is present, a request shouldn't fail, but it doesn't
say that if no asterisk is present, a request should fail if there is
no language match. And it's the latter condition that makes the
second sentence meaningful. So I think you want to insert between
the
two sentences one regarding the absence of an asterisk.
5.5. Examples
Given that the combined audio/video mechanism is the only
irregularity
in this system, there ought to be an example of it. E.g.,
An example of a supplemental video stream with a spoken language
audio stream:
m=video 51372 RTP/AVP 31 32
a=humintlang-send:en
a=humintlang-recv:en
m=audio 49250 RTP/AVP 20
a=humintlang-send:en
a=humintlang-recv:en
6. IANA Considerations
humintlang-value = Language-Tag [ asterisk ]
; Language-Tag defined in RFC 5646
asterisk = "*"
s/Language-Tag defined in RFC 5646/Language-Tag as defined in RFC
5646/
But perhaps also s/RFC 5646/BCP 47/, which ensures that "humintlang"
tracks the current version of language tags.
Appendix A. Historic Alternative Proposal: Caller-prefs
This
results in a more fragile solution since the media modality and
language would be negotiated using SIP, and then the specific
media
formats (which inherently include the modality) would be
negotiated
at a different level (typically SDP, especially in the emergency
calling cases), making it easier to have mismatches (such as where
the media modality negotiated in SIP don't match what was
negotiated
using SDP).
"the media modality and language would be negotiated using SIP" isn't
quite the right way to say it because SIP isn't explicitly
negotiating
the modality. Better would be
... the language (and by implication the media modality) would be
negotiated using SIP, and then the specific media (which
inherently
include the modalities and formats) would be negotiated at a
different level ...
[END]