Hello Carsten,
Sorry for the delay of my answer.
On 2022-06-28 18:51, Carsten Bormann wrote:
Hi Martin,
This explanation helps in that it confirms that overall, there are three different values and one case of no value, resulting in four different choices. So we are in agreement about the 'facts'. (originally, it wasn't clear to me whether null and absent were two different choices, or one and the same).
What we don't yet agree is how this is presented. You want to present it as three values and one separate case/choice of an absent value.
Yes, because choosing a value is very different from supplying one or not supplying one.
I still think that from an implementer's point, and from a user point,
and from an information-theoretic point, it's four choices. That three
of these choices take up one byte each, whereas the fourth choice takes
up 0 bytes, is in my view a detail. But anyhow, I think the new text in
-07 makes it much more difficult to be confused about whether there are
three or four choices; the -06 made that very difficult because absence
was mentioned before a null value, and both were together in the same
sentence, separated only by a semicolon.
Thinking from the perspective of a user, I'd strongly prefer it if all four choices were listed in one single list (or table, or whatever). That way, it's immediately clear what the choices are, each with its meaning. For a user, the exact way of how a choice gets expressed (by a value, or by the absence of a value) is in my view secondary, and shouldn't be made the top-level branching point for the presentation of these four choices.
But the choice is only between the three values, as the presence/absense ultimately leads to one of the three values by inheritance.
Base-rtl and base-lang are specified to apply to unadorned (non-tag-38) strings.
It is not defined in the current text whether an implementation would apply base-rtl to tag-38 strings with absent directionality.
I think it's bad practice to leave things like this open. The only result it can produce is confusion, and nobody benefits from that. Why not just say that base-rtl and base-lang apply if language or direction are missing on a tag-38?
Because tag 38 operates independent from the enclosing problem-details document.
It needs to be possible to implement tag 38 in its own library.
If that happens to have a way to supply default parameters, these could be taken from base-rtl/-lang in the calling problem-details implementation.
There's really no point in going halfway in specifying bidi for problem
details. What you want is that senders (e.g. servers sending a problem
detail; ultimately programmers/translators that prepared the text(s) in
that problem detail) have the confidence that the text(s) they prepared
get displayed the way they intended so the reader on the client side
will understand them without having to guess what was messed up.
An implementation of tag 38 "in its own library" will most probably come
with a way to supply a default parameter; otherwise, the implementers
haven't really read the spec.
Of course there will be implementations on constrained devices that
completely ignore directionality because they are not intended for the
Middle-Eastern market. They will not contain fonts for Arabic or Hebrew
either, and that's just fine, but a separate consideration.
[…]
We should make sure that the text of the RFC to be encourages 3), not 1) or 2).
To me, (3) is really “follow the flow”, which is just a short way of saying “the reader is advised to consult ongoing standardization activities”.
There are no ongoing standardization activities on the bidi algorithm. If you say that a string of text is in RTL (or LTR) isolation, or in first strong isolation, what that means is defined by Unicode Standard Annex #9, which doesn't "flow" at all.
Indeed, that’s why we now reference Annex #9 in all three places.
However, the reference to FSI is weaker, as this is just one “auto” algorithm, and the document leaves handling “auto” more open than “ltr” and “rtl” (which are now quite sharp based on your help).
I still don't understand this. What's the point of a standard if stuff
is left open? What we want is that somebody who creates/translates the
text of a problem detail is able to precisely express what their
understanding/expectations with regards to directionality is. Just
saying "something automatic may happen, and it may be equivalent to
first strong isolation, or it may be something else" doesn't guarantee
that, the only thing it guarantees is a mess.
I agree that there are other "auto" algorithms, but they are really,
really not popular. The only "auto" algorithm that TR #9 uses is first
strong (as already explained in detail in an earlier mail, see below).
It's also the easiest to implement, in particular on constrained devices.
[…]
But what's much more important here is that the solution is NOT to go read STRING-META, because STRING-META won't help. STRING-META shows different ways of how one could indicate directionality in different cases. You already made your choice, so STRING-META is no longer relevant. What counts is the Unicode Bidirectional Algorithm.
We left in a weaker reference (Readers interested in further details […] may want to consult […]) to STRING-META as that is still useful background material for further reading.
This is still misleading, because readers of your document don't have much of a reason to read that document. The main reason may be to answer questions such as "why do we need such a directionality indicator", but if that's the case, your text should be more specific.
OK, I took out the “for further reading” paragraph.
Very good, thanks.
As an implementer, I’d probably have benefitted from some background material that explains enough so it can take me off the knee-jerk “what the heck is this” reaction.
But for people who already know a bit about this space, too much information may indeed be confusing.
If you want some background for implementers, which I think is a
reasonable concern, then there are other documents better suited than
STRING-META (which is mainly for spec writers).
A quick search on https://www.w3.org/International/ brought up e.g. the
following:
https://www.w3.org/International/articles/strings-and-bidi/
https://www.w3.org/International/articles/lang-bidi-use-cases/
https://www.w3.org/International/articles/inline-bidi-markup/uba-basics
But there might be even better resources.
Regards, Martin.
[…]
Hence the informative reference.
[…]
I'm not really sure yet about the 'absent' and 'null' entries, neither if they are really distinct nor whether the specification is good enough (we might want to specify FIRST STRONG ISOLATE semantics).
We could, but I’m not sure that part of “auto” semantics is as stable as the rest.
In TR #9, the auto semantics is as stable as the others. FSI (first strong isolate) was introduced in Unicode 6.3 together with the other isolates. And the "first strong" rule was already present from the start of the Bidi Algorithm and continues to be there until today for the overall paragraph direction (see https://www.unicode.org/reports/tr9/#P2). That also means that you get exactly these semantics if you just put every Tag 38 text on its own line (paragraph) e.g. in Win notepad. That also means that the average user of an RTL script is familiar with this behavior and what to do if it doesn't do the right thing.
FSI is now explicitly mentioned as a choice that an internationalization library could take.
This is progress. But the text currently says (copied directly from github):
- `null` indicates that that no indication is made about the direction
("auto"), enabling an internationalization library to make a
decision such as treating the string as if enclosed in FSI ... PDI
or equivalent, see {{-bidi}}.
This is still too vague. What about
- `null`: Indicates auto-detection of direction.
The text is expected to be displayed with the base direction
determined by the directionality of the first strong character
if standalone, and isolated with first strong detection
(as if enclosed in FSI ... PDI or equivalent, see {{-bidi}})
in the context of a longer string or text.
I don’t think we want to make FSI the prescribed meaning of the third branch here, so I wouldn’t want to make this change. I do like the term “auto-detection” though...
All the changes from this round in https://github.com/core-wg/core-problem-details/pull/42 — I’m expecting this to be the last round.
Grüße, Carsten
--
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call