Re: [Last-Call] [art] Artart last call review of draft-ietf-core-problem-details-05

Carsten Bormann <cabo@xxxxxxx> · Mon, 27 Jun 2022 10:56:42 +0200

Hi Martin,

thank you for the additional input.
I’ll focus on the discussion that led us to proposing another set of clarifications:

https://github.com/core-wg/core-problem-details/pull/41

> […]
>> We have prepared a pull request at:
>> https://github.com/core-wg/core-problem-details/pull/40 <https://github.com/core-wg/core-problem-details/pull/40>
> 
> I have looked at the pull request. It mostly looks good.

Thank you.

> See below for more details.
> 
> [snip]
>>> Directionality Information
>>> ==========================
> 
> is also a technical term in the Bidi Algorithm]
>>> 
>>> I think this text is very important, so I'll got into some details. First (minor nit), it says "If the third element is absent ...". Because this is in a paragraph that starts with "The optional third element ...", I think it would better say "If this element is absent ...".
>> Replaced by (a form of) your text…
> 
> Great progress, but I think we need a bit more progress, or at least some more careful checking.
> 
> 
>> ➔
>> https://github.com/core-wg/core-problem-details/pull/40/commits/bd588b9 <https://github.com/core-wg/core-problem-details/pull/40/commits/bd588b9>
>>> 
>>> Next, let me make sure that I get this right: This is a Boolean value, but it can in effect have four different states, yes? That would be:
>>> - True (rtl)
>>> - False (ltr)
>>> - null (no indication about direction, but overriding any context)
>>> - absent (no indication about direction, but context may apply)
>>> If that's true, then it might be good to put that into a more structured from (something like the above list).
>> Thanks, see below. (A value that is absent is not a value; its representation by a null value may be needed to ~~override~~ reset any context available.)
> 
> In one of the patches, you collapsed my four-point list to three points.

This is really a ternary setting, with `ltr` (represented by false), `rtl` (by true), and `auto` (by null).
The fourth case is that the setting is not given, i.e., absent.
The default value for that case is taken from the context (not specified where that would come from) that overrides (sorry) this default.
If there is no context, the default default is `auto`.

> I'm still not sure I really get this thing with absent and null. Let's say we have the following two problem details (very sketchy, obviously not the right syntax):
> 
> First variant
> -------------
> - problem-details
>  - title
>    - lang: de
>    - text: Das ist ein Titel
>    [dir absent]
>  - base-rtl: true
> 
> Second variant
> --------------
> - problem-details
> - title
>   - lang: de
>   - text: Das ist ein Titel
>   - dir: null
> - base-rtl: true
> 
> Here are my questions: Is there a difference between the first and the second variant?

Base-rtl and base-lang are specified to apply to unadorned (non-tag-38) strings.
It is not defined in the current text whether an implementation would apply base-rtl to tag-38 strings with absent directionality.

> Saying "its representation by a null value may be needed to reset any context available." seems to suggest that there is a difference; inheritance ("Das ist ein Titel" being RTL) when absent, and no inheritance ("Das ist ein Titel" being of undefined directionality) when null. If that's true, why did you reduce the four choices to three? If it's not true, why not?

Because there is only three values, and there is one way (absent) to obtain a default from that set of three in the absence of context indicating one of those three values.

>>> [very major point] The main problem is with the last sentence. There's not much of a point in defining a field for directionality if it's not clear what that is supposed to be used for. I'm also not sure where the claim "the proper processing of Language and Direction Metadata is an active area of investigation" came from, and why it is here.
>> I believe this statement is rather important, as it does spell out the requirement to stay abreast with the developments in this space. The tag 38 information provides an input to the algorithm that we just need to assume will survive revisions to that algorithm; but the algorithm may be revised.
> 
> Do you mean the Unicode Bidirectional Algorithm? It indeed gets reissued with every new Unicode version, which means roughly once every year. That's just how the Unicode consortium works, something between "living standard" and RFCs that are stable as long as nobody has the time to write an update.

Good point; I copied the annotation for Unicode-14.0.0 to Unicode-14.0.0-bidi.

> But if you look at the substance (going back from
> https://www.unicode.org/reports/tr9/tr9-45.html version by version by changing '45' to lower and lower values), you'll see that there is exactly one major change, at Unicode Version 6.3 in 2013 (https://www.unicode.org/reports/tr9/tr9-29.html), where isolates (LRI and RLI) where introduced. And that change was years in the making, with several talks at Internationalization and Unicode Conferences about the problems posed by embeddings (LRE and RLE). There's no such change in site that I'm aware of currently.
> 
> So the sentence "Note that the proper processing of Language and Direction Metadata is an active area of investigation; the reader is advised to consult ongoing standardization activities such as [STRING-META] when processing the information represented in this tag."
> will produce one of two effects, both highly undesired:
> 1) An implementer who seriously wants to do the right thing will get lost in the woods.
> 2) An implementer inclined to cut corners will just ignore the whole directionality stuff.
> 
> Of course, the right thing for an implementer who just wants to make sure the text pieces get displayed so that they are easy for a user to read is
> 3) to just rely on a bidi library (usually just by sending the right pieces of text and bidi control characters or markup or whatsoever to the display engine in the underlying OS or so).
> 
> We should make sure that the text of the RFC to be encourages 3), not 1) or 2).

To me, (3) is really “follow the flow”, which is just a short way of saying “the reader is advised to consult ongoing standardization activities”.

But I understand that one wouldn’t want to expose the reader to all that complexity if they can avail themselves is a library that encapsulates and hides that complexity.
So the PR makes it clearer that it is up to such a library to decide the details semantics of “auto”.

(In constrained devices, there often won’t be such a library.)

We left in a weaker reference (Readers interested in further details […] may want to consult […]) to STRING-META as that is still useful background material for further reading.

> […]
> 
>> Hence the informative reference.
>>> […]
>>> I'm not really sure yet about the 'absent' and 'null' entries, neither if they are really distinct nor whether the specification is good enough (we might want to specify FIRST STRONG ISOLATE semantics).
>> We could, but I’m not sure that part of “auto” semantics is as stable as the rest.
> 
> In TR #9, the auto semantics is as stable as the others. FSI (first strong isolate) was introduced in Unicode 6.3 together with the other isolates. And the "first strong" rule was already present from the start of the Bidi Algorithm and continues to be there until today for the overall paragraph direction (see https://www.unicode.org/reports/tr9/#P2). That also means that you get exactly these semantics if you just put every Tag 38 text on its own line (paragraph) e.g. in Win notepad. That also means that the average user of an RTL script is familiar with this behavior and what to do if it doesn't do the right thing.

FSI is now explicitly mentioned as a choice that an internationalization library could take.

I hope we can move forward with this latest PR.

Grüße, Carsten

-- 
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call