Re: Review of draft-ietf-slim-negotiating-human-language-06

Natasha Rooney <nrooney@xxxxxxxx> · Tue, 21 Feb 2017 23:28:50 +0000

Hi Dale! Many thanks for these comments, as one of the SLIM chairs I’ll work on getting some answers to you or I’ll request the author or one of the SLIM active participants to respond. 

Bernard, Randall and SLIM - see below re Dale’s points. 

**** A. Call failure ****
Randall or Bernard can you respond to this? I imagine the UA call failure method is UA specific, but if there is a clash with SIP level call fail then this should be addressed.

**** B. Audio/Video coordination ****
I believe the theory behind the current draft is that the spoken and video streams will be different in the cases of such things as sign language. Video could therefore be sign and audio would be a spoken language. I’m not sure if the suggestion
 he satisfies this case?

**** C. "humintlang" seems long to me ****
Bernard - I don’t see the issue with shortening humintlang, but the group might. I suggest we throw this to the group for discussion.

**** D. Use the Accept-Language syntax ****
Randall and Bernard - is this an acceptable change? Or one we need to discuss further. Seems like a reasonable request, but also a larger change, which is why I ask!

**** E. Have an attribute to abbreviate the bidirectionally-symmetric case ****
I do not remember us having this discussion within the group, although it may have occurred before I became chair.
Randall, Brian or Bernard - has this idea been discussed before? If so, can one of you respond with an explanation as to why we haven’t done this?

**** Editorial comments and nits *****
Randall - can you take a look through Dale’s editorial comments and shout if there is any problems with these suggestions; if everything is ok please make the changes.

Thanks all!

Natasha Rooney | Internet Engineering Director | Internet and Web Team | Technology | GSMA | nrooney@xxxxxxxx | +44 (0) 7730 219 765 | @thisNatasha | Skype: nrooney@xxxxxxx

On 18 Feb 2017, at 02:32, Dale Worley <worley@xxxxxxxxxxx> wrote:

Reviewer: Dale Worley

Review result: Ready with Nits

I am the assigned Gen-ART reviewer for this draft.  The General Area

Review Team (Gen-ART) reviews all IETF documents being processed

by the IESG for the IETF Chair.  Please treat these comments just

like any other last call comments.

For more information, please see the FAQ at

<http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.

Document:  draft-ietf-slim-negotiating-human-language-06

Reviewer:  Dale R. Worley

Review Date:  2017-02-17

IETF LC End Date:  2017-02-20

IESG Telechat date:  [unknown]

Summary:

      This draft is basically ready for publication, but has nits

      that should be fixed before publication.

* Technical comments

A. Call failure

If a call fails due to no available language match, in what way(s)

does it fail?  Section 5.3 says

  If such an offer is received, the receiver MAY

  reject the media, ignore the language specified, or attempt to

  interpret the intent

But I suspect it's also allowed for the UAS to fail the call at the

SIP level.  Whether or not that is allowed (or at least envisioned)

should be described.  And what response code(s)/warn-code(s) should

be

used for that?

B. Audio/Video coordination

  5.2.  New 'humintlang-send' and 'humintlang-recv' attributes

  Note that while signed language tags are used with a video stream

to

  indicate sign language, a spoken language tag for a video stream

in

  parallel with an audio stream with the same spoken language tag

  indicates a request for a supplemental video stream to see the

  speaker.

And there's a similar paragraph in 5.4:

  A spoken language tag for a video stream in conjunction with an

audio

  stream with the same language might indicate a request for

  supplemental video to see the speaker.

I think this mechanism needs to be described more exactly, and in

particular, it should not depend on the UA understanding which

language tags are spoken language tags.  It seems to me that a

workable rule is that there is an audio stream and a video stream and

they specify exactly the same language tag in their respective

humintlang attributes.  In that case, it is a request for a spoken

language with simultaneous video of the speaker, and those requests

should be considered satisfied only if both streams can be

established.

* The following three items are adjustments to the design which I'd

like to know have been considered.

C. "humintlang" seems long to me

Given the excessive length of SDP in practice, it seems to me that a

shorter attribute name would be desirable.  E.g., "humlang" as was

used in some previous versions.  Or is there a coordinated usage with

other names in the "hum*lang" pattern?

D. Use the Accept-Language syntax

It seems to me that it would better to use the Accept-Language syntax

for the attribute values.  This allows (1) specifiying the quality of

language experience, allowing clear description of bilingualism, (2)

a

unified method of specifying whether or not arbitrary languages are

acceptable, and (3) abbreviating SDP descriptions.

In a way, the fact that the current proposal seems to require (but

does not directly specify) the coordinated absence/presence of an

asterisk on all of the repetitions of humintlang-send or

humintlang-recv is a warning that the syntax doesn't represent the

semantics as well as it might.

E. Have an attribute to abbreviate the bidirectionally-symmetric case

Note that all examples are bidirectionally symmetric, and the text

says that requests and responses SHOULD be bidirectionally symmetric.

So it would be a very useful abbreviation to define

"humintlang=<value>" to be equivalent to the combination of

"humintlang-send=<value>" and "humintlang-recv=<value>".

Combining proposals C, D, and E, the examples become

     m=audio 49170 RTP/AVP 0

     a=humlang:en

     m=video 51372 RTP/AVP 31 32

     a=humlang:ase,*;q=0.1

     m=audio 49250 RTP/AVP 20

     a=humlang:es,eu;q=0.9,en;q=0.8,*;q=0.1

     m=text 45020 RTP/AVP 103 104

     a=humlang:gr

which requires about half as many characters as they have now.

* Editorial comments and nits

Abstract

  This document describes the need and a solution using new SDP

stream

  attributes.

I don't think the term "stream attribute" is used in RFC 4566.

Instead, it uses "media attribute".

1.  Introduction

  caller and callee know each other or there is contextual or out of

  band information from which the language(s) and media modalities

can

I think this context, it's preferred to hyphenate "out-of-band" to

make it clearly be an adjective.

  This approach has a number of benefits, including that it is

generic

  (applies to all interactive communications negotiated using SDP)

and

  not limited to emergency calls.

I think s/and not limited to/and is not limited to/ reads more

smoothly.

  But it is clearly useful in many other cases.  For

  example, someone calling a company call center or a Public Safety

  Answering Point (PSAP) should be able to indicate if one or more

  specific signed, written, and/or spoken languages are preferred,

the

  callee should be able to indicate its capabilities in this area,

and

  the call proceed using in-common language(s) and media forms.

I think s/preferred, the callee/preferred; the callee/ because the

sentence is the concatenation of two sentences.

Perhaps s/in-common/shared/.

  Including the user's human (natural) language preferences in the

  session establishment negotiation is independent of the use of a

  relay service and is transparent to a voice service provider. 

I think it's even broader than "transparent to a voice service

provider" -- it's transparent to any serivice provider, assuming that

the media are language-neutral.

  In the case of a call to e.g., an airline, the call could be

  automatically handled by a Spanish-speaking agent.

I think s/handled by/routed to/ is the usual usage.

3.  Desired Semantics

  The desired solution is a media attribute (preferably per

direction)

  that may be used within an offer to indicate the preferred

language

  of each (direction of a) media stream, and within an answer to

  indicate the accepted language.

In this one instance, I think you want to use "language(s)" to drive

home that that multiple languages can be specified:  "within an offer

to indicate the preferred language(s)".

  (Negotiating multiple simultaneous languages within a media stream

is

  out of scope, as the complexity of doing so outweighs the

  usefulness.)

You might want to say instead "(Negotiating multiple simultaneous

languages within a media stream is out of scope for this document.)"

to ensure that nobody decides to argue whether "the complexity of

doing so outweighs the usefulness".

4.  The existing 'lang' attribute

  RFC 4566 [RFC4566] specifies an attribute 'lang' which appears

  similar to what is needed here, but is not sufficiently detailed

for

  use here.

"for use here" isn't quite right.  Maybe "is not sufficiently

specific

or flexible to satisfy the requirements".

  In addition, it is not mentioned in [RFC3264]

"it" is somewhat ambiguous here, perhaps change to "the 'lang'

attribute".

5.  Proposed Solution

Perhaps /Proposed Solution/Solution/, since once this draft is

approved, it becomes the solution.

5.2.  New 'humintlang-send' and 'humintlang-recv' attributes

     a=humintlang-send:<language tag>

     a=humintlang-recv:<language tag>

This is presented as the generic form of the attributes, but there is

no indication of the posible asterisk.

  The values constitute a list of languages

  in preference order (first is most preferred).

"The values" isn't very clear, because the values are in successive

attributes.  You want to say something like "The sequence of values

in

the occurrences of one of these attributes constitutes ...". 

However,

see the technical comments above.

  When placing an emergency call, and in any other case where the

  language cannot be assumed from context, each media stream in an

  offer primarily intended for human language communication SHOULD

  specify both (or in some cases, one of) the 'humintlang-send' and

  'humintlang-recv' attributes.

Probably s/assumed/inferred/.

Could you be more accurate by

s/or in some cases/or for unidirectional streams/?

5.3.  Advisory vs Required

  The mechanism for indicating this preference is that, in an offer,

if

  the last character of any of the 'humintlang-recv' or 'humintlang-

  send' values is an asterisk, this indicates a request to not fail

the

  call (similar to SIP Accept-Language syntax).  Either way, the

called

  party MAY ignore this, e.g., for the emergency services use case,

a

  PSAP will likely not fail the call.

The construction of this paragraph isn't quite complete.  It says

that

if an asterisk is present, a request shouldn't fail, but it doesn't

say that if no asterisk is present, a request should fail if there is

no language match.  And it's the latter condition that makes the

second sentence meaningful.  So I think you want to insert between

the

two sentences one regarding the absence of an asterisk.

5.5.  Examples

Given that the combined audio/video mechanism is the only

irregularity

in this system, there ought to be an example of it.  E.g.,

  An example of a supplemental video stream with a spoken language

  audio stream:

     m=video 51372 RTP/AVP 31 32

     a=humintlang-send:en

     a=humintlang-recv:en

     m=audio 49250 RTP/AVP 20

     a=humintlang-send:en

     a=humintlang-recv:en

6.  IANA Considerations

     humintlang-value =  Language-Tag [ asterisk ]

                         ; Language-Tag defined in RFC 5646

     asterisk         =  "*"

s/Language-Tag defined in RFC 5646/Language-Tag as defined in RFC

5646/

But perhaps also s/RFC 5646/BCP 47/, which ensures that "humintlang"

tracks the current version of language tags.

Appendix A.  Historic Alternative Proposal: Caller-prefs

  This

  results in a more fragile solution since the media modality and

  language would be negotiated using SIP, and then the specific

media

  formats (which inherently include the modality) would be

negotiated

  at a different level (typically SDP, especially in the emergency

  calling cases), making it easier to have mismatches (such as where

  the media modality negotiated in SIP don't match what was

negotiated

  using SDP).

"the media modality and language would be negotiated using SIP" isn't

quite the right way to say it because SIP isn't explicitly

negotiating

the modality.  Better would be

  ... the language (and by implication the media modality) would be

  negotiated using SIP, and then the specific media (which

inherently

  include the modalities and formats) would be negotiated at a

  different level ...

[END]

This
 email and its attachments are intended for the above named only and may be confidential. If they have come to you in error you must take no action based on them, nor must you copy or show them to anyone; please reply to this email or call +44 207 356 0600
 and highlight the error.