Re: [Last-Call] Artart last call review of draft-ietf-calext-jscontact-07

Martin J. Dürst <duerst@xxxxxxxxxxxxxxx> · Mon, 21 Aug 2023 01:02:44 +0900

Sorry for not replying sooner. Currently offline, so I can't check the 
actual wording.

On 2023-07-04 16:27, Robert Stepanek wrote:
On Tue, Jul 4, 2023, at 8:52 AM, Carsten Bormann wrote:
On 4. Jul 2023, at 08:47, Robert Stepanek <rsto@xxxxxxxxxxxxxxxx> wrote:

The "dir" attribute just contains the equivalent markup for Unicode sequences such as RLI ... PDI. Any text in JSContact is a UTF-8 encoded string which can contain the Unicode Bidi_Control code points, so there is no need for markup.

That is certainly one way to do this.
So you assume all RTL text contains these sequences?  You maybe should say so.

I do not assume that, it's up to implementations if they add those sequences and render them accordingly.

Implementations are expected to interoperate, so we need to be a bit 
more precise.

For the individual text pieces (e.g. a surname or the street part of an 
address), the above is true. But the spec should say very explicitly 
that for every opening tag, there should also be a corresponding closing 
tag. And the spec should also say that for unidirectional text pieces 
(e.g. a give name only with Arabic or only with Hebrew letters (plus 
neutrals such as spaces inbetween)) no Bidi control characters are needed.

That's not different than with any other Unicode sequences. We only added a reference to TR-9 because we got asked during review about bidirectional text. Maybe explicitly mentioning this part of Unicode brings up more confusion, and we might rather highlight instead that text may be any valid UTF-8 encoded Unicode.

How do these sequences compose, e.g., when building a name from its components?

The document recommends to concatenate the components string values in order. That should work fine with properly beginning and ending formatting characters.

"Concatenate the components string values in order" works well for the 
internal logical storage. But for display, the question is which order. 
LTR or RTL? The right answer may be a mixture. If you have several 
pieces that are RTL, it's easier to read them in RTL order. Same for 
several pieces in LTR order. But if you have an RTL component at the end 
of the name part, and another RTL component at the start of the address 
part, and you display the whole thing inline, it's unclear whether the 
reader wants the name pieces and the address pieces clearly separated 
(leading to more 'jumps' of the reading sequence) or wants subsequent 
pieces that read the same way in the respective order independent of 
whether that visually interleaves e.g. name and address parts.

We had very similar questions when discussing the display of bidi IRIs 
(the bidi solution in RFC 3987 is just one way of doing things, not 
necessarily the preferred one for all users and all kinds of sequences).

And for names/addresses, the addressee may also have a preference (i.e. 
"I write my name with the components LTR" or "I write my name with the 
components RTL". But not sure if mixed direction (e.g. mixed script) 
names are actually "a thing" in the relevant regions. If not, or "not 
really", then it might be worth to put something into the spec to 
recommend avoidance of mixed-direction data (I don't remember the 
details, but it would be okay to have e.g. an English and a Hebrew 
locale, but the English locale should be all Latin/LTR, and the Hebrew 
locale should be all Hebrew (script)/RTL. In that case, neither DIR 
attributes nor bidi control characters would be needed.

Regards,   Martin.

--
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call