Hello Carsten,
Glad to see lots of progress, both in the draft and in our mutual
understanding. As our fields of expertise are not very close, this
indeed requires quite some work :-(.
On 2022-06-30 00:40, Carsten Bormann wrote:
Hi Martin,
thank you for your further explanations.
Base-rtl and base-lang are specified to apply to unadorned (non-tag-38) strings.
It is not defined in the current text whether an implementation would apply base-rtl to tag-38 strings with absent directionality.
I think it's bad practice to leave things like this open. The only result it can produce is confusion, and nobody benefits from that. Why not just say that base-rtl and base-lang apply if language or direction are missing on a tag-38?
Because tag 38 operates independent from the enclosing problem-details document.
It needs to be possible to implement tag 38 in its own library.
If that happens to have a way to supply default parameters, these could be taken from base-rtl/-lang in the calling problem-details implementation.
There's really no point in going halfway in specifying bidi for problem details. What you want is that senders (e.g. servers sending a problem detail; ultimately programmers/translators that prepared the text(s) in that problem detail) have the confidence that the text(s) they prepared get displayed the way they intended so the reader on the client side will understand them without having to guess what was messed up.
If the generator knows what it wants, it can set ltr/rtl, that is easy.
The hard part is what it can do when it doesn’t.
The question here is who the 'generator' is. In my understanding, human
readable text is produced, at least initially, by humans. The way humans
work is to type text, look at how it displays, and intervene when they
are not happy with the display order. Intervening means setting the
(base) directionality of the text (snippet) in question to a specific
directionality. Not intervening means being okay with the implicitly
detected (base) directionality. That implicit detection in the bidi
algorithm is exactly the same as "first strong".
Not specifying directionality (absent third element) is one way; the data item generator by this essentially leaves everything to the data item consumer. Note that this allows that the consumer can be wiser than the generator; having the generator dictate what the consumer must do in this case is counterproductive.
For the consumer, we do suggest defaulting absent to “auto” if no other context is available.
Explicit null (“auto”), again, means that the generator does not know enough to set ltr/rtl; but it does know that any additional context in the data item won’t help (e.g., because the controlled text is imported from a different environment than the text in the rest of the data item).
Here again, asking the generator to specify something it doesn’t know does not make sense.
The FSI..PDI suggestion is just a suggestion, “auto” does not “mean” FSI.
(If the generator knows the directionality of the text, it’ll set rtl or ltr and not auto.) You may argue this “leaves things open”, I would argue that this is the case where there isn’t enough information on the generator side to nail anything down.
It seems to me that your model of 'generator' is some automatic text
generation. Mine is a human writer or translator. (Except for the
special case of a blind person,) they will judge whether directionality
is okay or not visually.
An implementation of tag 38 "in its own library" will most probably come with a way to supply a default parameter; otherwise, the implementers haven't really read the spec.
Yes.
Of course there will be implementations on constrained devices that completely ignore directionality because they are not intended for the Middle-Eastern market. They will not contain fonts for Arabic or Hebrew either, and that's just fine, but a separate consideration.
Indeed, that is the problem — but one interesting case is where the constrained device doesn’t know much, the less-constrained correspondent does. The objective here is to make these asymmetric situations work.
The text in question somehow must have gotten into the constrained
device in the first place. In that case (and absent bad engineering
which is always a possibility), it will have come with the relevant
directionality information. It's clear that a constrained device cannot
come with some very smart AI heuristics to fix up directionality, but
neither is a more powerful device supposed to do so, nor guaranteed to
get it right if it does so. So to me the distinction between constrained
devices and less-constrained devices doesn't seem to help here. (I'm
sure there are quite a few other cases where it makes a lot of sense.)
I agree that there are other "auto" algorithms, but they are really, really not popular. The only "auto" algorithm that TR #9 uses is first strong (as already explained in detail in an earlier mail, see below). It's also the easiest to implement, in particular on constrained devices.
You are comparing general algorithms. But one of the two peers might know more and be able to apply that knowledge. The current text for “auto” is designed to allow this.
Can you give an example of what you mean by "might know more"? My
understanding is still that the original writer/translator must know
best, the machines in the middle, whether very simple and small or very
powerful, don't know anything at all, and the recipient reader shouldn't
have to guess or 'know'.
If you want some background for implementers, which I think is a reasonable concern, then there are other documents better suited than STRING-META (which is mainly for spec writers).
A quick search on https://www.w3.org/International/ brought up e.g. the following:
https://www.w3.org/International/articles/strings-and-bidi/
https://www.w3.org/International/articles/lang-bidi-use-cases/
https://www.w3.org/International/articles/inline-bidi-markup/uba-basics
But there might be even better resources.
Thank you for the pointers. I’m planning to make a PR to draft-bormann-cbor-notable-tags.
The notable-tags draft is intended to explain more about usage and implementation of certain tags, including when to choose which.
It is probably a better place to include pointers to knowledge that is still growing, so I would expect us to direct further work there.
Sounds reasonable. Maybe it may make sense to, in the long term, move
the definition of tag 38 from problem-details to this draft. Anyway,
please give me (and the relevant lists!) a heads up when you have a new
version with some relevant text.
Regards, Martin.
--
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call