Re: Review of draft-ietf-slim-negotiating-human-language-06

Randall Gellens <rg+ietf@xxxxxxxxxxxxxxxxx> · Tue, 21 Feb 2017 17:44:42 -0800

Hi Dale,

Thank you for your review, I appreciate it.  Please see inline.

At 6:32 PM -0800 2/17/17, Dale Worley wrote:

 Reviewer: Dale Worley
 Review result: Ready with Nits

 I am the assigned Gen-ART reviewer for this draft.  The General Area
 Review Team (Gen-ART) reviews all IETF documents being processed
 by the IESG for the IETF Chair.  Please treat these comments just
 like any other last call comments.

 For more information, please see the FAQ at
 <http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.

 Document:  draft-ietf-slim-negotiating-human-language-06
 Reviewer:  Dale R. Worley
 Review Date:  2017-02-17
 IETF LC End Date:  2017-02-20
 IESG Telechat date:  [unknown]

 Summary:
        This draft is basically ready for publication, but has nits
        that should be fixed before publication.

 * Technical comments

 A. Call failure

 If a call fails due to no available language match, in what way(s)
 does it fail?  Section 5.3 says

    If such an offer is received, the receiver MAY
    reject the media, ignore the language specified, or attempt to
    interpret the intent

 But I suspect it's also allowed for the UAS to fail the call at the
 SIP level.  Whether or not that is allowed (or at least envisioned)
 should be described.  And what response code(s)/warn-code(s) should
 be
 used for that?

The text you quote has been deleted.  The draft does not mandate if 
the call should proceed or fail if there is no language match 
possible, although the draft does provide an optional mechanism to 
indicate the caller's preference that the call not fail, and the 
draft does mention that in the emergency services case, the call will 
likely proceed, but that's a matter of policy not protocol.

 B. Audio/Video coordination

    5.2.  New 'humintlang-send' and 'humintlang-recv' attributes

    Note that while signed language tags are used with a video stream
 to
    indicate sign language, a spoken language tag for a video stream
 in
    parallel with an audio stream with the same spoken language tag
    indicates a request for a supplemental video stream to see the
    speaker.

 And there's a similar paragraph in 5.4:

    A spoken language tag for a video stream in conjunction with an
 audio
    stream with the same language might indicate a request for
    supplemental video to see the speaker.

 I think this mechanism needs to be described more exactly, and in
 particular, it should not depend on the UA understanding which
 language tags are spoken language tags.  It seems to me that a
 workable rule is that there is an audio stream and a video stream and
 they specify exactly the same language tag in their respective
 humintlang attributes.  In that case, it is a request for a spoken
 language with simultaneous video of the speaker, and those requests
 should be considered satisfied only if both streams can be
 established.

The text you quote has been deleted.  A media stream for supplemental 
purposes can be negotiated without a language tag, as normal.

 * The following three items are adjustments to the design which I'd
 like to know have been considered.

 C. "humintlang" seems long to me

 Given the excessive length of SDP in practice, it seems to me that a
 shorter attribute name would be desirable.  E.g., "humlang" as was
 used in some previous versions.  Or is there a coordinated usage with
 other names in the "hum*lang" pattern?

There is no intent for a coordinated pattern.  The name was chosen 
years ago to avoid potential confusion with the 'lang' attribute.  Is 
it worth reopening the issue to potentially save three characters per 
SDP line with a language?

 D. Use the Accept-Language syntax

 It seems to me that it would better to use the Accept-Language syntax
 for the attribute values.  This allows (1) specifiying the quality of
 language experience, allowing clear description of bilingualism, (2)
 a
 unified method of specifying whether or not arbitrary languages are
 acceptable, and (3) abbreviating SDP descriptions.

 In a way, the fact that the current proposal seems to require (but
 does not directly specify) the coordinated absence/presence of an
 asterisk on all of the repetitions of humintlang-send or
 humintlang-recv is a warning that the syntax doesn't represent the
 semantics as well as it might.

The group considered multiple proposals to permit specifying quality, 
preference, q-values, etc. but decided to keep things simple for this 
draft.  There is no intent to require the use of an asterisk (to 
indicate a preference by the caller to not fail the call).  The 
asterisk is a very mild mechanism with no normative effects.  It 
merely conveys the preference of the caller, and is not binding on 
the answerer.

 E. Have an attribute to abbreviate the bidirectionally-symmetric case

 Note that all examples are bidirectionally symmetric, and the text
 says that requests and responses SHOULD be bidirectionally symmetric.
 So it would be a very useful abbreviation to define
 "humintlang=<value>" to be equivalent to the combination of
 "humintlang-send=<value>" and "humintlang-recv=<value>".

 Combining proposals C, D, and E, the examples become

       m=audio 49170 RTP/AVP 0
       a=humlang:en

       m=video 51372 RTP/AVP 31 32
       a=humlang:ase,*;q=0.1

       m=audio 49250 RTP/AVP 20
       a=humlang:es,eu;q=0.9,en;q=0.8,*;q=0.1

       m=text 45020 RTP/AVP 103 104
       a=humlang:gr

 which requires about half as many characters as they have now.

A third attribute without the "-send" or "-recv" to indicate 
bidirectionality would reduce the characters in the SDP block, at the 
cost of some added complexity (e.g., what if all three appear).  I 
don't believe this has been discussed in the group.

 * Editorial comments and nits

 Abstract

    This document describes the need and a solution using new SDP
 stream
    attributes.

 I don't think the term "stream attribute" is used in RFC 4566.
 Instead, it uses "media attribute".

Fixed.

 1.  Introduction

    caller and callee know each other or there is contextual or out of
    band information from which the language(s) and media modalities
 can

 I think this context, it's preferred to hyphenate "out-of-band" to
 make it clearly be an adjective.

 OK.

    This approach has a number of benefits, including that it is
 generic
    (applies to all interactive communications negotiated using SDP)
 and
    not limited to emergency calls.

 I think s/and not limited to/and is not limited to/ reads more
 smoothly.

There's no harm in the extra "is" so I'm happy to add it.

    But it is clearly useful in many other cases.  For
    example, someone calling a company call center or a Public Safety
    Answering Point (PSAP) should be able to indicate if one or more
    specific signed, written, and/or spoken languages are preferred,
 the
    callee should be able to indicate its capabilities in this area,
 and
    the call proceed using in-common language(s) and media forms.

 I think s/preferred, the callee/preferred; the callee/ because the
 sentence is the concatenation of two sentences.

I reworded the sentence to flow better:

   For example, it is helpful that someone calling a company call center
   or a Public Safety Answering Point (PSAP) be able to indicate
   preferred signed, written, and/or spoken languages, the callee be
   able to indicate its capabilities in this area, and the call proceed
   using the language(s) and media forms supported by both.

 Perhaps s/in-common/shared/.

Fixed in the rewording above.

    Including the user's human (natural) language preferences in the
    session establishment negotiation is independent of the use of a
    relay service and is transparent to a voice service provider.

 I think it's even broader than "transparent to a voice service
 provider" -- it's transparent to any serivice provider, assuming that
 the media are language-neutral.

I changed it to read "voice or other service provider".

    In the case of a call to e.g., an airline, the call could be
    automatically handled by a Spanish-speaking agent.

 I think s/handled by/routed to/ is the usual usage.

We are trying to be careful in the draft to not imply that it is 
discussing call routing.  I'd rather keep the more generic "handled 
by".

 3.  Desired Semantics

    The desired solution is a media attribute (preferably per
 direction)
    that may be used within an offer to indicate the preferred
 language
    of each (direction of a) media stream, and within an answer to
    indicate the accepted language.

 In this one instance, I think you want to use "language(s)" to drive
 home that that multiple languages can be specified:  "within an offer
 to indicate the preferred language(s)".

    (Negotiating multiple simultaneous languages within a media stream
 is
    out of scope, as the complexity of doing so outweighs the
    usefulness.)

 You might want to say instead "(Negotiating multiple simultaneous
 languages within a media stream is out of scope for this document.)"
 to ensure that nobody decides to argue whether "the complexity of
 doing so outweighs the usefulness".

I agree and deleted "the complexity of doing so outweighs the usefulness".

 4.  The existing 'lang' attribute

    RFC 4566 [RFC4566] specifies an attribute 'lang' which appears
    similar to what is needed here, but is not sufficiently detailed
 for
    use here.

 "for use here" isn't quite right.  Maybe "is not sufficiently
 specific
 or flexible to satisfy the requirements".

    In addition, it is not mentioned in [RFC3264]

 "it" is somewhat ambiguous here, perhaps change to "the 'lang'
 attribute".

OK, accepted both changes.

 5.  Proposed Solution

 Perhaps /Proposed Solution/Solution/, since once this draft is
 approved, it becomes the solution.

OK.

 5.2.  New 'humintlang-send' and 'humintlang-recv' attributes

       a=humintlang-send:<language tag>
       a=humintlang-recv:<language tag>

 This is presented as the generic form of the attributes, but there is
 no indication of the posible asterisk.

The syntax has been deleted from 5.2 since it's now in 6.

    The values constitute a list of languages
    in preference order (first is most preferred).

 "The values" isn't very clear, because the values are in successive
 attributes.  You want to say something like "The sequence of values
 in
 the occurrences of one of these attributes constitutes ...".
 However,
 see the technical comments above.

The text was reworded to read:

   The values from all
   instances of the attribute constitute a list of languages in
   preference order (first is most preferred).

    When placing an emergency call, and in any other case where the
    language cannot be assumed from context, each media stream in an
    offer primarily intended for human language communication SHOULD
    specify both (or in some cases, one of) the 'humintlang-send' and
    'humintlang-recv' attributes.

 Probably s/assumed/inferred/.

I agree.

 Could you be more accurate by
 s/or in some cases/or for unidirectional streams/?

I agree.

 5.3.  Advisory vs Required

    The mechanism for indicating this preference is that, in an offer,
 if
    the last character of any of the 'humintlang-recv' or 'humintlang-
    send' values is an asterisk, this indicates a request to not fail
 the
    call (similar to SIP Accept-Language syntax).  Either way, the
 called
    party MAY ignore this, e.g., for the emergency services use case,
 a
    PSAP will likely not fail the call.

 The construction of this paragraph isn't quite complete.  It says
 that
 if an asterisk is present, a request shouldn't fail, but it doesn't
 say that if no asterisk is present, a request should fail if there is
 no language match.  And it's the latter condition that makes the
 second sentence meaningful.  So I think you want to insert between
 the
 two sentences one regarding the absence of an asterisk.

I've reworded the section to read:

   A consideration with the ability to negotiate language is if the call
   proceeds or fails if the callee does not support any of the languages
   requested by the caller.  This document does not mandate either
   behavior, although it does provide a way for the caller to indicate a
   preference for the call succeeding when there is no language in
   common.  It is OPTIONAL for the callee to honor this preference.  For
   example, a PSAP is likely to attempt the call even without an
   indicated preference when there is no language in common, while a
   call center might choose to fail the call.

   The mechanism for indicating this preference is that, in an offer, if
   the last character of any of the 'humintlang-recv' or 'humintlang-
   send' values is an asterisk, this indicates a request to not fail the
   call.  The called party MAY ignore the indication, e.g., for the
   emergency services use case, regardless of the absence of an
   asterisk, a PSAP will likely not fail the call; some call centers
   might reject a call even with an asterisk.

 5.5.  Examples

 Given that the combined audio/video mechanism is the only
 irregularity
 in this system, there ought to be an example of it.  E.g.,

    An example of a supplemental video stream with a spoken language
    audio stream:

       m=video 51372 RTP/AVP 31 32
       a=humintlang-send:en
       a=humintlang-recv:en

       m=audio 49250 RTP/AVP 20
       a=humintlang-send:en
       a=humintlang-recv:en

If the video stream is supplemental then it doesn't have a language 
(the text that suggested otherwise has been deleted).  But I am 
considering adding more examples.

 6.  IANA Considerations

       humintlang-value =  Language-Tag [ asterisk ]
                           ; Language-Tag defined in RFC 5646
       asterisk         =  "*"

 s/Language-Tag defined in RFC 5646/Language-Tag as defined in RFC
 5646/

 But perhaps also s/RFC 5646/BCP 47/, which ensures that "humintlang"
 tracks the current version of language tags.

Ok.

 Appendix A.  Historic Alternative Proposal: Caller-prefs

    This
    results in a more fragile solution since the media modality and
    language would be negotiated using SIP, and then the specific
 media
    formats (which inherently include the modality) would be
 negotiated
    at a different level (typically SDP, especially in the emergency
    calling cases), making it easier to have mismatches (such as where
    the media modality negotiated in SIP don't match what was
 negotiated
    using SDP).

 "the media modality and language would be negotiated using SIP" isn't
 quite the right way to say it because SIP isn't explicitly
 negotiating
 the modality.  Better would be

    ... the language (and by implication the media modality) would be
    negotiated using SIP, and then the specific media (which
 inherently
    include the modalities and formats) would be negotiated at a
    different level ...

This section has been deleted.

 [END]

--
Randall Gellens
Opinions are personal;    facts are suspect;    I speak for myself only
-------------- Randomly selected tag: ---------------
I make a fortune from criticizing the policy of the government, and
then hand it over to the government in taxes to keep it going.
                                             --George Bernard Shaw