Re: [Slim] IETF last call for draft-ietf-slim-negotiating-human-language (Section 5.4)

Randall Gellens <rg+ietf@xxxxxxxxxxxxxxxxx> · Wed, 15 Feb 2017 15:01:05 -0800

Hi Addison,

Please see in-line.

At 5:12 PM +0000 2/15/17, Addison Phillips wrote:

 Gunnar replied:

 Den 2017-02-14 kl. 21:39, skrev Phillips, Addison:
 > I have some allergy to the SHALL language: there is no way to 
automatically
 determine conformance. Many language tags represent nonsensical values, due
 to the nature of language tag composition. Content providers need 
to use care
 in selecting the tags that they use and this section is merely 
pointing out good
 guidance for tag selection, albeit in a heavy-handed way. BCP47 RFC 5646
 Section 4.1 [1] already provides most of this guidance and a 
reference to that
 source might be useful here, if only because that document requires it:
 >
 > <quote>
 >     Standards, protocols, and applications that reference this document
 >     normatively but apply different rules to the ones given in this
 >     section MUST specify how language tag selection varies from the
 >     guidelines given here.
 > </quote>
 >
 > I would suggest reducing the SHALL items to SHOULD.
 Accepted.
 That also opens up for another use we have discussed before but been advised
 to not use. That is to indicate use of written language by 
attaching a script
 subtag even if the script subtag we use is suppressed by BCP 47. 
We can dro that
 need however, with the use of the Zxxx script subtag for 
non-written, and clearly
 include that usage in our specification as required from BCP 47.

 I don't necessarily think that mandating a script subtag as a 
signal of written content (vs. spoken content) is that useful. In 
most protocols, the written nature of the content is indicated by 
the presence of text. Trying to coerce the language modality via 
language tags seems complicated, especially since most language 
tags are harvested from the original source. Introducing processes 
to evaluate and insert or remove script subtags seems unnecessary 
to me. That said, I have no objection to content using script 
subtags if they are useful.

 >
 > I'm not sure what #2 really means. Shouldn't text captions be 
indicated by the
 written language rather than the spoken language? And I'm not sure what
 "spoken/written language" means.
 #2 was: "

 2.    Text captions included in the video stream SHALL be indicated
 by a Language-Tag for spoken/written language."

 Yes, the intention is to use written language in the video stream. There are
 technologies for that.

 I'm aware of that. My concern is that in this case "spoken/written" 
is applied to "text captions", which are not spoken be definition? 
This section is talking about the differences between identifying 
spoken and written language. The text captions fall into the 
written side of the equation, no?

Keep in mind that the focus of the draft is enabling language 
negotiation along with media negotiation for interactive 
communications.  In this context, as I noted in previous replies, 
real-time text captions for sign language in video is a service.  A 
mechanism for requesting services needs to be carefully thought out 
in the WG and not added to the current draft at the last minute.

 I'd probably prefer to see something like "2. Text captions 
included in the video stream SHOULD include a Language-Tag to 
identify the language."

 Since the language subtags in the IANA registry are combined for spoken
 languages and written languages, I call them Language-Tags for 
spoken/written
 language.

 The language subtags are for languages--all modalities. My comment 
here is that "spoken/written" adds no information.

 It would be misleading to say that we use a Language-Tag for a written
 language, because the same tag could in another context mean a spoken
  > language.

 One uses a Language-Tag for indicating the language. When the text 
is written, sometimes the user will pick a different language tag 
(zh-Hant-HK) than they might choose for spoken text (yue-HK, 
zh-cmn-HK, etc.). Sometimes (actually, nearly all the time except 
for special cases) the language tag for the spoken and written 
language is the same tag (en-US, de-CH, ja-JP, etc.). Again, the 
modality of the language is a separate consideration from the 
language. Nearly always, it is better to use the same tag for both 
spoken and written content rather than trying to use the tag to 
distinguish between them: different Content-Types require different 
decoders anyway, but it is really useful to say "give me all of the 
'en-US' content you have" or "do you have content for a user who 
speaks 'es'"

Requesting all content available in a specific language is outside 
the scope of the draft, which is enabling language negotiation along 
with media negotiation for interactive communications.  One key 
example is a user placing an emergency call.  The call setup can 
negotiate both language and media for interactive communication 
between the emergency services answering point and the user.

 Since we have the script subtag Zxxx for non-written, we do not need to
 construct an explicit tag for the written language tag, it should 
be sufficient with
 our specification of the use in our case.

 In case it isn't clear aboe, I oppose introducing the 'Zxxx' subtag 
save for cases where the non-written nature of the content is 
super-important to the identification of the language.

 In my latest recent proposal, I still have a very similar wording. 
Since you had
 problems understanding it, there might still be a need to tune it. Can you
 propose wording?
 This is the current proposal:

 "   2.    Text captions included in the video stream SHOULD be indicated
    by a humintlang attribute with Language-Tag for spoken/written language.
 "

 I did that above. I think it is useful not to over-think it. When I 
see "Content-Type: video/mpeg; Content-Language: en-GB", I rather 
expect audio content in English and not written content (although 
the video stream might also includes pictures of English text such 
as the titles in a movie). When, as in this case, setting up a 
negotiated language experience, interoperability is most aided by 
matching the customer's language preferences to available 
resources. This is easiest when customers do not get carried away 
with complex language tags (ranges in BCP 47 parlance, e.g. 
tlh-Cyrl-AQ-fonupa) and systems do not have to introspect the 
language tags, inserting and removing script subtags to match the 
various language modes.

 Addison

 _______________________________________________
 SLIM mailing list
 SLIM@xxxxxxxx
 https://www.ietf.org/mailman/listinfo/slim

--
Randall Gellens
Opinions are personal;    facts are suspect;    I speak for myself only
-------------- Randomly selected tag: ---------------
The First Amendment is often inconvenient. But that is besides the
point.  Inconvenience does not absolve the government of its
obligation to tolerate speech. --Justice Anthony Kennedy, in 91-155