Re: [Slim] IETF last call for draft-ietf-slim-negotiating-human-language (Section 5.4)

Gunnar Hellström <gunnar.hellstrom@xxxxxxxxxx> · Wed, 15 Feb 2017 22:36:24 +0100

Addison,

Den 2017-02-15 kl. 18:12, skrev Phillips, Addison:
Gunnar replied:
Den 2017-02-14 kl. 21:39, skrev Phillips, Addison:
I have some allergy to the SHALL language: there is no way to automatically
determine conformance. Many language tags represent nonsensical values, due
to the nature of language tag composition. Content providers need to use care
in selecting the tags that they use and this section is merely pointing out good
guidance for tag selection, albeit in a heavy-handed way. BCP47 RFC 5646
Section 4.1 [1] already provides most of this guidance and a reference to that
source might be useful here, if only because that document requires it:
<quote>
     Standards, protocols, and applications that reference this document
     normatively but apply different rules to the ones given in this
     section MUST specify how language tag selection varies from the
     guidelines given here.
</quote>

I would suggest reducing the SHALL items to SHOULD.
Accepted.
That also opens up for another use we have discussed before but been advised
to not use. That is to indicate use of written language by attaching a script
subtag even if the script subtag we use is suppressed by BCP 47. We can dro that
need however, with the use of the Zxxx script subtag for non-written, and clearly
include that usage in our specification as required from BCP 47.
I don't necessarily think that mandating a script subtag as a signal of written content (vs. spoken content) is that useful. In most protocols, the written nature of the content is indicated by the presence of text. Trying to coerce the language modality via language tags seems complicated, especially since most language tags are harvested from the original source. Introducing processes to evaluate and insert or remove script subtags seems unnecessary to me. That said, I have no objection to content using script subtags if they are useful.
In this case we are negotiating use of media streams before they are 
established, so that the connection will be made between best capable 
devices and call participants. There is no indication in the media 
coding parameters available to tell if text will be carried in video.  
So, if we want to be able to specify the three modalities possible in 
video we needto have differentiated notations for them: 1. view of a 
signing person, 2. view of a speaking person, 3. Text.      For 1. the 
signing person, it is simple, because the language subtags are explicit 
in that they indicate sign language. But for the two others I was not 
aware of any useful way before I was informed about the Zxxx script subtag.
Now, the reasoning about the need for these to be possible to 
distinguish caused me to specify that for the view of the signing person 
we use the Zxxx script subtag and for any text we do not need to specify 
any script subtag.
The view of the speaking person is the only very important alternative 
of the four identified "silly states", and that was already included in 
section 5.2. But both Bernard and I wanted to see the "silly states" 
chapter sharpened up and real alternatives sorted out and specified.

I'm not sure what #2 really means. Shouldn't text captions be indicated by the
written language rather than the spoken language? And I'm not sure what
"spoken/written language" means.
#2 was: "

2.    Text captions included in the video stream SHALL be indicated
by a Language-Tag for spoken/written language."

Yes, the intention is to use written language in the video stream. There are
technologies for that.
I'm aware of that. My concern is that in this case "spoken/written" is applied to "text captions", which are not spoken be definition? This section is talking about the differences between identifying spoken and written language. The text captions fall into the written side of the equation, no?

I'd probably prefer to see something like "2. Text captions included in the video stream SHOULD include a Language-Tag to identify the language."
Yes, that is a way to avoid mentioning spoken/written that apparently is 
confusing when the subtag in this case is used for written modality.

Since the language subtags in the IANA registry are combined for spoken
languages and written languages, I call them Language-Tags for spoken/written
language.
The language subtags are for languages--all modalities. My comment here is that "spoken/written" adds no information.
Spoken/written is different from signed that is the "normal" modality 
for video.

It would be misleading to say that we use a Language-Tag for a written
language, because the same tag could in another context mean a spoken
language.
One uses a Language-Tag for indicating the language. When the text is written, sometimes the user will pick a different language tag (zh-Hant-HK) than they might choose for spoken text (yue-HK, zh-cmn-HK, etc.). Sometimes (actually, nearly all the time except for special cases) the language tag for the spoken and written language is the same tag (en-US, de-CH, ja-JP, etc.). Again, the modality of the language is a separate consideration from the language. Nearly always, it is better to use the same tag for both spoken and written content rather than trying to use the tag to distinguish between them: different Content-Types require different decoders anyway, but it is really useful to say "give me all of the 'en-US' content you have" or "do you have content for a user who speaks 'es'"

Since we have the script subtag Zxxx for non-written, we do not need to
construct an explicit tag for the written language tag, it should be sufficient with
our specification of the use in our case.
In case it isn't clear aboe, I oppose introducing the 'Zxxx' subtag save for cases where the non-written nature of the content is super-important to the identification of the language.
There is a possible alternative in RFC 4796, the SDP Content attribute, 
where "speaker" can be identified.  But that does not easily allow for 
describing alternative use of video for sign language or view of a 
speaking person.

So, the alternative to using Zxxx is as I see it to not be able to 
specify text in the video stream. Good interoperability of text in the 
text stream is so much more important so I am prepare to go that way if 
it is needed. Bernards view would be interesting here.
In my latest recent proposal, I still have a very similar wording. Since you had
problems understanding it, there might still be a need to tune it. Can you
propose wording?
This is the current proposal:

"   2.    Text captions included in the video stream SHOULD be indicated
    by a humintlang attribute with Language-Tag for spoken/written language.
"
I did that above. I think it is useful not to over-think it. When I see "Content-Type: video/mpeg; Content-Language: en-GB", I rather expect audio content in English and not written content (although the video stream might also includes pictures of English text such as the titles in a movie). When, as in this case, setting up a negotiated language experience, interoperability is most aided by matching the customer's language preferences to available resources. This is easiest when customers do not get carried away with complex language tags (ranges in BCP 47 parlance, e.g. tlh-Cyrl-AQ-fonupa) and systems do not have to introspect the language tags, inserting and removing script subtags to match the various language modes.

Addison
/Gunnar

_______________________________________________
SLIM mailing list
SLIM@xxxxxxxx
https://www.ietf.org/mailman/listinfo/slim

--
-----------------------------------------
Gunnar Hellström
Omnitor
gunnar.hellstrom@xxxxxxxxxx
+46 708 204 288