Re: [Last-Call] Artart last call review of draft-ietf-calext-jscontact-07

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sorry for not replying sooner. Currently offline, so I can't check the actual wording.

On 2023-07-04 16:27, Robert Stepanek wrote:
On Tue, Jul 4, 2023, at 8:52 AM, Carsten Bormann wrote:
On 4. Jul 2023, at 08:47, Robert Stepanek <rsto@xxxxxxxxxxxxxxxx> wrote:

The "dir" attribute just contains the equivalent markup for Unicode sequences such as RLI ... PDI. Any text in JSContact is a UTF-8 encoded string which can contain the Unicode Bidi_Control code points, so there is no need for markup.

That is certainly one way to do this.
So you assume all RTL text contains these sequences?  You maybe should say so.

I do not assume that, it's up to implementations if they add those sequences and render them accordingly.

Implementations are expected to interoperate, so we need to be a bit more precise.

For the individual text pieces (e.g. a surname or the street part of an address), the above is true. But the spec should say very explicitly that for every opening tag, there should also be a corresponding closing tag. And the spec should also say that for unidirectional text pieces (e.g. a give name only with Arabic or only with Hebrew letters (plus neutrals such as spaces inbetween)) no Bidi control characters are needed.

That's not different than with any other Unicode sequences. We only added a reference to TR-9 because we got asked during review about bidirectional text. Maybe explicitly mentioning this part of Unicode brings up more confusion, and we might rather highlight instead that text may be any valid UTF-8 encoded Unicode.

How do these sequences compose, e.g., when building a name from its components?

The document recommends to concatenate the components string values in order. That should work fine with properly beginning and ending formatting characters.

"Concatenate the components string values in order" works well for the internal logical storage. But for display, the question is which order. LTR or RTL? The right answer may be a mixture. If you have several pieces that are RTL, it's easier to read them in RTL order. Same for several pieces in LTR order. But if you have an RTL component at the end of the name part, and another RTL component at the start of the address part, and you display the whole thing inline, it's unclear whether the reader wants the name pieces and the address pieces clearly separated (leading to more 'jumps' of the reading sequence) or wants subsequent pieces that read the same way in the respective order independent of whether that visually interleaves e.g. name and address parts.

We had very similar questions when discussing the display of bidi IRIs (the bidi solution in RFC 3987 is just one way of doing things, not necessarily the preferred one for all users and all kinds of sequences).

And for names/addresses, the addressee may also have a preference (i.e. "I write my name with the components LTR" or "I write my name with the components RTL". But not sure if mixed direction (e.g. mixed script) names are actually "a thing" in the relevant regions. If not, or "not really", then it might be worth to put something into the spec to recommend avoidance of mixed-direction data (I don't remember the details, but it would be okay to have e.g. an English and a Hebrew locale, but the English locale should be all Latin/LTR, and the Hebrew locale should be all Hebrew (script)/RTL. In that case, neither DIR attributes nor bidi control characters would be needed.

Regards,   Martin.

--
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call



[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Mhonarc]     [Fedora Users]

  Powered by Linux