Re: [Last-Call] [art] Artart last call review of draft-ietf-core-problem-details-05

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Carsten, others,

On 2022-06-27 17:56, Carsten Bormann wrote:
Hi Martin,

thank you for the additional input.
I’ll focus on the discussion that led us to proposing another set of clarifications:

https://github.com/core-wg/core-problem-details/pull/41

This also looks mostly okay.

By chance, I just noticed that you have
keyword: CoAP, API, Problem Details
Would it be possible to add any keywords related to language tagging and/or bidi?

More comments below.

[…]

[snip]
Directionality Information
==========================

is also a technical term in the Bidi Algorithm]

I think this text is very important, so I'll got into some details. First (minor nit), it says "If the third element is absent ...". Because this is in a paragraph that starts with "The optional third element ...", I think it would better say "If this element is absent ...".
Replaced by (a form of) your text…

Great progress, but I think we need a bit more progress, or at least some more careful checking.


https://github.com/core-wg/core-problem-details/pull/40/commits/bd588b9 <https://github.com/core-wg/core-problem-details/pull/40/commits/bd588b9>

Next, let me make sure that I get this right: This is a Boolean value, but it can in effect have four different states, yes? That would be:
- True (rtl)
- False (ltr)
- null (no indication about direction, but overriding any context)
- absent (no indication about direction, but context may apply)
If that's true, then it might be good to put that into a more structured from (something like the above list).
Thanks, see below. (A value that is absent is not a value; its representation by a null value may be needed to ~~override~~ reset any context available.)

In one of the patches, you collapsed my four-point list to three points.

This is really a ternary setting, with `ltr` (represented by false), `rtl` (by true), and `auto` (by null).
The fourth case is that the setting is not given, i.e., absent.
The default value for that case is taken from the context (not specified where that would come from) that overrides (sorry) this default.
If there is no context, the default default is `auto`.

This explanation helps in that it confirms that overall, there are three different values and one case of no value, resulting in four different choices. So we are in agreement about the 'facts'. (originally, it wasn't clear to me whether null and absent were two different choices, or one and the same).

What we don't yet agree is how this is presented. You want to present it as three values and one separate case/choice of an absent value.

Thinking from the perspective of a user, I'd strongly prefer it if all four choices were listed in one single list (or table, or whatever). That way, it's immediately clear what the choices are, each with its meaning. For a user, the exact way of how a choice gets expressed (by a value, or by the absence of a value) is in my view secondary, and shouldn't be made the top-level branching point for the presentation of these four choices.


I'm still not sure I really get this thing with absent and null. Let's say we have the following two problem details (very sketchy, obviously not the right syntax):

First variant
-------------
- problem-details
  - title
    - lang: de
    - text: Das ist ein Titel
    [dir absent]
  - base-rtl: true

Second variant
--------------
- problem-details
- title
   - lang: de
   - text: Das ist ein Titel
   - dir: null
- base-rtl: true

Here are my questions: Is there a difference between the first and the second variant?

Base-rtl and base-lang are specified to apply to unadorned (non-tag-38) strings.
It is not defined in the current text whether an implementation would apply base-rtl to tag-38 strings with absent directionality.

I think it's bad practice to leave things like this open. The only result it can produce is confusion, and nobody benefits from that. Why not just say that base-rtl and base-lang apply if language or direction are missing on a tag-38?

Saying "its representation by a null value may be needed to reset any context available." seems to suggest that there is a difference; inheritance ("Das ist ein Titel" being RTL) when absent, and no inheritance ("Das ist ein Titel" being of undefined directionality) when null. If that's true, why did you reduce the four choices to three? If it's not true, why not?

Because there is only three values, and there is one way (absent) to obtain a default from that set of three in the absence of context indicating one of those three values.

Three values, four ways/choices/cases, again.

[very major point] The main problem is with the last sentence. There's not much of a point in defining a field for directionality if it's not clear what that is supposed to be used for. I'm also not sure where the claim "the proper processing of Language and Direction Metadata is an active area of investigation" came from, and why it is here.
I believe this statement is rather important, as it does spell out the requirement to stay abreast with the developments in this space. The tag 38 information provides an input to the algorithm that we just need to assume will survive revisions to that algorithm; but the algorithm may be revised.

Do you mean the Unicode Bidirectional Algorithm? It indeed gets reissued with every new Unicode version, which means roughly once every year. That's just how the Unicode consortium works, something between "living standard" and RFCs that are stable as long as nobody has the time to write an update.

Good point; I copied the annotation for Unicode-14.0.0 to Unicode-14.0.0-bidi.

Okay.

But if you look at the substance (going back from
https://www.unicode.org/reports/tr9/tr9-45.html version by version by changing '45' to lower and lower values), you'll see that there is exactly one major change, at Unicode Version 6.3 in 2013 (https://www.unicode.org/reports/tr9/tr9-29.html), where isolates (LRI and RLI) where introduced. And that change was years in the making, with several talks at Internationalization and Unicode Conferences about the problems posed by embeddings (LRE and RLE). There's no such change in site that I'm aware of currently.

So the sentence "Note that the proper processing of Language and Direction Metadata is an active area of investigation; the reader is advised to consult ongoing standardization activities such as [STRING-META] when processing the information represented in this tag."
will produce one of two effects, both highly undesired:
1) An implementer who seriously wants to do the right thing will get lost in the woods.
2) An implementer inclined to cut corners will just ignore the whole directionality stuff.

Of course, the right thing for an implementer who just wants to make sure the text pieces get displayed so that they are easy for a user to read is
3) to just rely on a bidi library (usually just by sending the right pieces of text and bidi control characters or markup or whatsoever to the display engine in the underlying OS or so).

We should make sure that the text of the RFC to be encourages 3), not 1) or 2).

To me, (3) is really “follow the flow”, which is just a short way of saying “the reader is advised to consult ongoing standardization activities”.

There are no ongoing standardization activities on the bidi algorithm. If you say that a string of text is in RTL (or LTR) isolation, or in first strong isolation, what that means is defined by Unicode Standard Annex #9, which doesn't "flow" at all.


But I understand that one wouldn’t want to expose the reader to all that complexity if they can avail themselves is a library that encapsulates and hides that complexity.
So the PR makes it clearer that it is up to such a library to decide the details semantics of “auto”.

(In constrained devices, there often won’t be such a library.)

First, reference code is available (at https://www.unicode.org/Public/PROGRAMS/BidiReferenceC/14.0.0/), but that code is written for clarity, not for space or speed.

If there is no library or equivalent, the solution may be to go read TR #9, and figure out some ways to cut corners (as an example, an implementation may decide that a "fixed-size stack for exactly 63 elements" (see http://unicode.org/reports/tr9/#Paired_Brackets) uses too much memory, and use a smaller stack).

But what's much more important here is that the solution is NOT to go read STRING-META, because STRING-META won't help. STRING-META shows different ways of how one could indicate directionality in different cases. You already made your choice, so STRING-META is no longer relevant. What counts is the Unicode Bidirectional Algorithm.

We left in a weaker reference (Readers interested in further details […] may want to consult […]) to STRING-META as that is still useful background material for further reading.

This is still misleading, because readers of your document don't have much of a reason to read that document. The main reason may be to answer questions such as "why do we need such a directionality indicator", but if that's the case, your text should be more specific.

[…]

Hence the informative reference.
[…]
I'm not really sure yet about the 'absent' and 'null' entries, neither if they are really distinct nor whether the specification is good enough (we might want to specify FIRST STRONG ISOLATE semantics).
We could, but I’m not sure that part of “auto” semantics is as stable as the rest.

In TR #9, the auto semantics is as stable as the others. FSI (first strong isolate) was introduced in Unicode 6.3 together with the other isolates. And the "first strong" rule was already present from the start of the Bidi Algorithm and continues to be there until today for the overall paragraph direction (see https://www.unicode.org/reports/tr9/#P2). That also means that you get exactly these semantics if you just put every Tag 38 text on its own line (paragraph) e.g. in Win notepad. That also means that the average user of an RTL script is familiar with this behavior and what to do if it doesn't do the right thing.

FSI is now explicitly mentioned as a choice that an internationalization library could take.

This is progress. But the text currently says (copied directly from github):
- `null` indicates that that no indication is made about the direction
  ("auto"), enabling an internationalization library to make a
  decision such as treating the string as if enclosed in FSI ... PDI
  or equivalent, see {{-bidi}}.
This is still too vague. What about
- `null`: Indicates auto-detection of direction.
  The text is expected to be displayed with the base direction
  determined by the directionality of the first strong character
  if standalone, and isolated with first strong detection
  (as if enclosed in FSI ... PDI or equivalent, see {{-bidi}})
  in the context of a longer string or text.

Regards,   Martin.

I hope we can move forward with this latest PR.

Grüße, Carsten
.

--
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call




[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Mhonarc]     [Fedora Users]

  Powered by Linux