Re: [Slim] IETF last call for draft-ietf-slim-negotiating-human-language (Section 5.4)

Randall Gellens <rg+ietf@xxxxxxxxxxxxxxxxx> · Wed, 15 Feb 2017 16:26:45 -0800

At 3:52 PM -0800 2/15/17, Bernard Aboba wrote:

 Gunnar Hellstrom said:

 "The SDP Lang attribute in RFC 4566, where you 
(Randall) say it is intended for specifying a 
set of languages that all must be used in a 
session, while I say that it is intended for 
negotiation of at least one initial language."

 [BA] At IETF 96 in Berlin, we had a discussion 
of the history of the SDP Lang attribute within 
the MMUSIC WG.

 The Lang attribute was originally specified in 
RFC 2327, which was published in April 1998, 
more than four years prior to the publication 
of Offer/Answer RFC 3264 (June 2002), and three 
years prior to publication of the initial 
draft-rosenberg-mmusic-sdp-offer-answer-00 
(April 26, 2001).

 As a result, the Lang attribute could not have 
been designed for use in Offer/Answer 
negotiation, but instead was intended for use 
in the declarative SDP of multicast 
conferencing.  Note that the Lang attribute was 
not mentioned in RFC 3264, and noone at the 
MMUSIC WG session was aware of a subsequent SIP 
Offer/Answer implementation of it.

Which is what I was saying: it is descriptive of 
the media, which is very different from 
negotiation.  However, this is all moot now.

 On Wed, Feb 15, 2017 at 1:41 AM, Gunnar 
Hellström 
<<mailto:gunnar.hellstrom@xxxxxxxxxx>gunnar.hellstrom@xxxxxxxxxx> 
wrote:

 Den 2017-02-15 kl. 01:39, skrev Randall Gellens:

 At 4:21 PM -0800 2/14/17, Randy Presuhn wrote:

  Hi -

  On 2/14/2017 2:43 PM, Randall Gellens wrote:

  At 8:59 PM +0100 2/14/17, Gunnar Hellström wrote:

   Den 2017-02-14 kl. 19:05, skrev Randy Presuhn:

   Hi -

   On 2/14/2017 9:40 AM, Randall Gellens wrote:

   At 11:01 AM +0100 2/14/17, Gunnar Hellström wrote:

    My proposal for a reworded section 5.4 is:

    5.4.  Unusual language indications

    It is possible to specify an unusual indication where the language
    specified may look unexpected for the media type.

    For such cases the following guidance SHALL be applied for the
   humintlang attributes used in these situations.

    1.    A view of a speaking person in the video stream SHALL, when it
   has relevance for speech perception, be indicated by a Language-Tag
   for spoken/written language with the "Zxxx" script subtag to indicate
   that the contents is not written.

    2.    Text captions included in the video stream SHALL be indicated
   by a Language-Tag for spoken/written language.

    3.    Any approximate representation of sign language or
   fingerspelling in the text media stream SHALL be indicated by a
   Language-Tag for a sign language in text media.

    4.    When sign language related audio from a person using sign
   language is of importance for language communication, this SHALL be
   indicated by a Language-Tag for a sign language in audio media.

   [RG] As I said, I think we should avoid specifying this until we have
   deployment experience.

   ...

   From a process perspective, it's far easier to remove constraints
   as a specification advances than it is to add them.

   I agree. It is often better to specify normatively as far as you can
  imagine, so that interoperability and good functionality is achieved.
  Stopping halfway and have MAY in the specifications creates
  uncertainty and less useful specifications.

  My reading of what Randy says is the opposite of Gunnar's. In my
  reading, Randy points out that is it easier to remove the SHOULD NOT in
  the future then it is to change the meaning of the combinations or
  switch to a different mechanism.

  In my experience, it's better to specify only what we know we need and
  what we know we understand.  Speculative specifications "as far as you
  can imagine" more often lead to interoperability problems, unnecessary
  complexity, limitations on what's needed in the future, and divergent
  implementations.

  I think the difference in your positions comes down to

    (1) your respective notions of "what we know we need and what we
        know we understand";

    (2) whether you believe that the interoperability and conformance
        consequences of removing a "SHOULD NOT" could be the same
        as those merely retaining a "MUST" or "SHALL" - this determines
        whether Randy G.'s proposal provides a path for some future
        revision to mandate (if deployment experience substantiates the
        need/understanding) the behavior proposed by Gunnar. That path
        is not at all obvious to me.

 The purpose of the draft is to enable the two 
endpoints of a real-time communication session 
to agree which languages and media to use for 
interactive communication.  We have a mechanism 
of adding language tags to media stream 
negotiations.  In most cases, the language and 
media modality are an obvious fit.  There are 
combinations of media and language where the 
meaning is not so obvious, specifically, signed 
language tags with a audio or text, and 
non-signed language tags with video.  My 
proposal is that we say offerer SHOULD NOT send 
such combinations and answerer MAY ignore 
language. This allows future specifications for 
the underlying uses Gunnar wants (such as 
real-time subtitles in video and signed 
equivalents in text).  Such future 
specifications could define a use for the 
language and media combinations and remove the 
SHOULD NOT send and MAY ignore, or could define 
a new mechanism.  I don't think we know enough 
now to dictate what the solution should be.

 We have a fresh example from our own 
discussions in the SLIM group how unfortunate 
it is to not be sufficiently explicit in the 
first edition of a standard. The SDP Lang 
attribute in RFC 4566, where you (Randall) say 
it is intended for specifying a set of 
languages that all must be used in a session, 
while I say that it is intended for negotiation 
of at least one initial language. By having 
that uncertainty in a specification that has 
been published makes it very hard to sharpen up 
the specification afterwards because it would 
possibly make some implementations non 
conformant. And it makes potential implementors 
hesitant to use the current specifications, as 
it was with the SLIM work.

 For 5.4.

 I am OK with modifying from my latest proposal, but we need to be specific.
 I am also OK with reducing the SHALLs to SHOULDs as Addison requested.

 The situation is not that we lack knowledge. 
Here is what we know about the 4 cases of 
"unusual" indications:

 1. View of the speaker in video. Very important 
for speech perception. Quality requirements are 
documented in ITU-T H-series Supplement 1. Of 
real use only as a complement to the same 
spoken language in audio. Now, when we know 
about the Zxxx notation for non-written, we 
also have a good way of specifying it precisely.
 This case was also described in section 5.2 already.

 2. Text captions in the video stream.
 This can be either text merged into video and 
communicated as true part of the video image, 
or it can be a text component of a multimedia 
system, as MPEG-4, declared in SDP as m=video.
 It has been used in some videophone products, 
but I have not seen it used lately.
 It is a clearly defined case, and we can 
specify coding for it, but we do not at the 
moment know if it will be important to specify 
it.

 3. Sign language or fingerspelling in the text stream.
 I have seen a product using it for claimed sign 
language conversation. It is also in use in the 
simple text form with words in capitals 
approximately representing signs between 
persons involved in preparation of sign 
language productions and translations. But in 
that case it is in a session where they agree 
in other ways to start using the text stream 
for that purpose. So I think we can say that 
this is rare, and its use can be agreed by 
other means between the users. Still it is a 
clearly defined case.

 4. Audio from signing person related to sign 
language. This is more vague than the others. 
It may be a person signing in video and adding 
spoken words in audio to signing, but 
influenced by the word order and grammar of 
sign language with some ambition to make it 
reasonably understandable for both deaf and 
hearing participants. There are even some 
spoken words created from sign language that 
are commonly used by hearing persons in such 
situations. But for that case I anyway think it 
is better to define the audio part as the 
spoken language it is derived from, because of 
its intention to be understandable for hearing 
persons. All other variants I can imagine are 
even closer to the spoken language and should 
be specified with spoken language tag. If we 
only want to have the audio stream established 
to hear the background in the signing 
situation, then we should not specify language 
use of the audio stream.
 Even if we know what sign language tag in audio 
stream would be, it may be just as good to 
leave it undefined.

------------------------------------------------------------------------------------------------------------------------------------------------
 So, new proposal:

 5.4.  Unusual language indications

    It is possible to specify an unusual indication where the language
    specified may look unexpected for the media type.

    For such cases the following guidance SHOULD be applied for the
   humintlang attributes used in these situations.

    1.    A view of a speaking person in the video stream SHOULD, when it
   has relevance for speech perception, be 
indicated by a humintlang attribute with a 
Language-Tag
   for a spoken/written language with the "Zxxx" script subtag to indicate
   that the contents is not written.

    2.    Text captions included in the video stream SHOULD be indicated
   by a humintlang attribute with Language-Tag for spoken/written language.

    3.    A Language-Tag for a sign language 
specified in a humintlang attribute for a text 
stream MAY be interpreted as use of an 
approximate representation of sign language or 
fingerspelling in the text media stream. The 
use of such representation is rare and usually 
conveniently agreed by other means between the 
users during an established session. Common 
support of this indication SHOULD NOT be 
assumed or required.

    4.    A Language-Tag for a sign language 
specified in a humintlang attribute for an 
audio stream SHOULD NOT be indicated and MAY be 
ignored on reception. Any use of spoken words 
or spoken language in the audio stream SHOULD, 
when it can be of importance for language 
communication, be indicated by the 
corresponding Language-Tag for spoken language 
in a humintlang attribute for the audio stream.

 Gunnar

 --
 -----------------------------------------
 Gunnar Hellström
 Omnitor
 <mailto:gunnar.hellstrom@xxxxxxxxxx>gunnar.hellstrom@xxxxxxxxxx
 <tel:%2B46%20708%20204%20288>+46 708 204 288

 _______________________________________________
 SLIM mailing list
 <mailto:SLIM@xxxxxxxx>SLIM@xxxxxxxx

<https://www.ietf.org/mailman/listinfo/slim>https://www.ietf.org/mailman/listinfo/slim

--
Randall Gellens
Opinions are personal;    facts are suspect;    I speak for myself only
-------------- Randomly selected tag: ---------------
Computers are not intelligent.  They only think they are.