At 10:41 AM +0100 2/15/17, Gunnar Hellström wrote:
Den 2017-02-15 kl. 01:39, skrev Randall Gellens:
At 4:21 PM -0800 2/14/17, Randy Presuhn wrote:
Hi -
On 2/14/2017 2:43 PM, Randall Gellens wrote:
At 8:59 PM +0100 2/14/17, Gunnar Hellström wrote:
Den 2017-02-14 kl. 19:05, skrev Randy Presuhn:
Hi -
On 2/14/2017 9:40 AM, Randall Gellens wrote:
At 11:01 AM +0100 2/14/17, Gunnar Hellström wrote:
My proposal for a reworded section 5.4 is:
5.4. Unusual language indications
It is possible to specify an unusual indication where the language
specified may look unexpected for the media type.
For such cases the following guidance SHALL be applied for the
humintlang attributes used in these situations.
1. A view of a speaking person in
the video stream SHALL, when it
has relevance for speech perception, be indicated by a Language-Tag
for spoken/written language with the
"Zxxx" script subtag to indicate
that the contents is not written.
2. Text captions included in the video stream SHALL be indicated
by a Language-Tag for spoken/written language.
3. Any approximate representation of sign language or
fingerspelling in the text media stream SHALL be indicated by a
Language-Tag for a sign language in text media.
4. When sign language related audio from a person using sign
language is of importance for language communication, this SHALL be
indicated by a Language-Tag for a sign language in audio media.
[RG] As I said, I think we should avoid specifying this until we have
deployment experience.
...
From a process perspective, it's far easier to remove constraints
as a specification advances than it is to add them.
I agree. It is often better to specify normatively as far as you can
imagine, so that interoperability and good functionality is achieved.
Stopping halfway and have MAY in the specifications creates
uncertainty and less useful specifications.
My reading of what Randy says is the opposite of Gunnar's. In my
reading, Randy points out that is it easier to remove the SHOULD NOT in
the future then it is to change the meaning of the combinations or
switch to a different mechanism.
In my experience, it's better to specify only what we know we need and
what we know we understand. Speculative specifications "as far as you
can imagine" more often lead to interoperability problems, unnecessary
complexity, limitations on what's needed in the future, and divergent
implementations.
I think the difference in your positions comes down to
(1) your respective notions of "what we know we need and what we
know we understand";
(2) whether you believe that the interoperability and conformance
consequences of removing a "SHOULD NOT" could be the same
as those merely retaining a "MUST" or "SHALL" - this determines
whether Randy G.'s proposal provides a path for some future
revision to mandate (if deployment experience substantiates the
need/understanding) the behavior proposed by Gunnar. That path
is not at all obvious to me.
The purpose of the draft is to enable the two
endpoints of a real-time communication session
to agree which languages and media to use for
interactive communication. We have a
mechanism of adding language tags to media
stream negotiations. In most cases, the
language and media modality are an obvious
fit. There are combinations of media and
language where the meaning is not so obvious,
specifically, signed language tags with a
audio or text, and non-signed language tags
with video. My proposal is that we say
offerer SHOULD NOT send such combinations and
answerer MAY ignore language. This allows
future specifications for the underlying uses
Gunnar wants (such as real-time subtitles in
video and signed equivalents in text). Such
future specifications could define a use for
the language and media combinations and remove
the SHOULD NOT send and MAY ignore, or could
define a new mechanism. I don't think we know
enough now to dictate what the solution should
be.
We have a fresh example from our own
discussions in the SLIM group how unfortunate
it is to not be sufficiently explicit in the
first edition of a standard. The SDP Lang
attribute in RFC 4566, where you (Randall) say
it is intended for specifying a set of
languages that all must be used in a session,
while I say that it is intended for negotiation
of at least one initial language. By having
that uncertainty in a specification that has
been published makes it very hard to sharpen up
the specification afterwards because it would
possibly make some implementations non
conformant. And it makes potential implementors
hesitant to use the current specifications, as
it was with the SLIM work.
I don't believe the two cases are comparable.
The original SDP language attribute was unclear
how it was used. The new attributes specified in
the current draft are clearly defined for the set
of cases that has been the target solution space
all along: allowing the two sides to agree on
which language will be used in interactive
communications. You want to extend them for a
related solution space, which is additional
services to foster communication, such as adding
real-time captioning to video. I think it's
better to do that add-on work in a subsequent
draft.
For 5.4.
I am OK with modifying from my latest proposal, but we need to be specific.
I am also OK with reducing the SHALLs to SHOULDs as Addison requested.
The situation is not that we lack knowledge.
Here is what we know about the 4 cases of
"unusual" indications:
1. View of the speaker in video. Very important
for speech perception. Quality requirements are
documented in ITU-T H-series Supplement 1. Of
real use only as a complement to the same
spoken language in audio. Now, when we know
about the Zxxx notation for non-written, we
also have a good way of specifying it precisely.
This case was also described in section 5.2 already.
It's already possible to negotiate a video stream.
2. Text captions in the video stream.
This can be either text merged into video and
communicated as true part of the video image,
or it can be a text component of a multimedia
system, as MPEG-4, declared in SDP as m=video.
It has been used in some videophone products,
but I have not seen it used lately.
It is a clearly defined case, and we can
specify coding for it, but we do not at the
moment know if it will be important to specify
it.
I believe this is an additional service that
should be specified in a subsequent draft.
3. Sign language or fingerspelling in the text stream.
I have seen a product using it for claimed sign
language conversation. It is also in use in the
simple text form with words in capitals
approximately representing signs between
persons involved in preparation of sign
language productions and translations. But in
that case it is in a session where they agree
in other ways to start using the text stream
for that purpose. So I think we can say that
this is rare, and its use can be agreed by
other means between the users. Still it is a
clearly defined case.
I believe this is an additional service that
should be specified in a subsequent draft.
4. Audio from signing person related to sign
language. This is more vague than the others.
It may be a person signing in video and adding
spoken words in audio to signing, but
influenced by the word order and grammar of
sign language with some ambition to make it
reasonably understandable for both deaf and
hearing participants. There are even some
spoken words created from sign language that
are commonly used by hearing persons in such
situations. But for that case I anyway think it
is better to define the audio part as the
spoken language it is derived from, because of
its intention to be understandable for hearing
persons. All other variants I can imagine are
even closer to the spoken language and should
be specified with spoken language tag. If we
only want to have the audio stream established
to hear the background in the signing
situation, then we should not specify language
use of the audio stream.
Even if we know what sign language tag in audio
stream would be, it may be just as good to
leave it undefined.
I believe this is an additional service that
should be specified in a subsequent draft.
------------------------------------------------------------------------------------------------------------------------------------------------
So, new proposal:
5.4. Unusual language indications
It is possible to specify an unusual indication where the language
specified may look unexpected for the media type.
For such cases the following guidance SHOULD be applied for the
humintlang attributes used in these situations.
1. A view of a speaking person in the video stream SHOULD, when it
has relevance for speech perception, be
indicated by a humintlang attribute with a
Language-Tag
for a spoken/written language with the "Zxxx" script subtag to indicate
that the contents is not written.
2. Text captions included in the video stream SHOULD be indicated
by a humintlang attribute with Language-Tag for spoken/written language.
3. A Language-Tag for a sign language
specified in a humintlang attribute for a text
stream MAY be interpreted as use of an
approximate representation of sign language or
fingerspelling in the text media stream. The
use of such representation is rare and usually
conveniently agreed by other means between the
users during an established session. Common
support of this indication SHOULD NOT be
assumed or required.
4. A Language-Tag for a sign language
specified in a humintlang attribute for an
audio stream SHOULD NOT be indicated and MAY be
ignored on reception. Any use of spoken words
or spoken language in the audio stream SHOULD,
when it can be of importance for language
communication, be indicated by the
corresponding Language-Tag for spoken language
in a humintlang attribute for the audio stream.
Gunnar
--
-----------------------------------------
Gunnar Hellström
Omnitor
gunnar.hellstrom@xxxxxxxxxx
+46 708 204 288
--
Randall Gellens
Opinions are personal; facts are suspect; I speak for myself only
-------------- Randomly selected tag: ---------------
Think airlines. Here a durable competitive advantage has proven
elusive ever since the days of the Wright Brothers. Indeed, if a
farsighted capitalist had been present at Kitty Hawk, he would have
done his successors a huge favor by shooting Orville down.
--"Oracle of Omaha," Warren Buffett