Hi,
I'm still not convinced that there is something Bidi-specific to add to this specification, other than what Unicode standard and the current specification already define.
I tried coming up with a paragraph that includes the recommendations outlined in your email. They all came out looking like they would emphasize select requirements of the Unicode specifications, making it unclear why mentioning these and not others is required.
On the point of requiring Unicode Bidi_Control closing tags: Unicode Standard Annex #9 already defines clearly what do with string values that do not embed balanced PDF or PDI characters. Certainly we can reiterate the requirements of the Unicode Bidirectional Algorithm, but that feels kind of out of place to me? Would every RFC dealing with Unicode text do that?
On the point of directionality across string values: the "components" list properties of the Name and Address objects are defined as "components SHOULD be ordered such that their values joined as a String produce a valid full name/address of this entity. If so, implementations MUST set the isOrdered property value to "true"." That's all there is to define preference for a specific order of name or address components. How to render these components is out of scope of this specification.
That's not to say that I'm generally opposed to adding BiDi-related information to this spec. I just still don't feel like the current definitions are missing something, or conversely that the BiDi-points raised give a full picture of what might need to be addressed because it's missing in the underlying Unicode spec.
Regards,
Robert
On Sun, Aug 20, 2023, at 6:02 PM, Martin J. Dürst wrote:
Sorry for not replying sooner. Currently offline, so I can't check theactual wording.On 2023-07-04 16:27, Robert Stepanek wrote:> On Tue, Jul 4, 2023, at 8:52 AM, Carsten Bormann wrote:>> On 4. Jul 2023, at 08:47, Robert Stepanek <rsto@xxxxxxxxxxxxxxxx> wrote:>>>>>> The "dir" attribute just contains the equivalent markup for Unicode sequences such as RLI ... PDI. Any text in JSContact is a UTF-8 encoded string which can contain the Unicode Bidi_Control code points, so there is no need for markup.>>>> That is certainly one way to do this.>> So you assume all RTL text contains these sequences? You maybe should say so.>> I do not assume that, it's up to implementations if they add those sequences and render them accordingly.Implementations are expected to interoperate, so we need to be a bitmore precise.For the individual text pieces (e.g. a surname or the street part of anaddress), the above is true. But the spec should say very explicitlythat for every opening tag, there should also be a corresponding closingtag. And the spec should also say that for unidirectional text pieces(e.g. a give name only with Arabic or only with Hebrew letters (plusneutrals such as spaces inbetween)) no Bidi control characters are needed.> That's not different than with any other Unicode sequences. We only added a reference to TR-9 because we got asked during review about bidirectional text. Maybe explicitly mentioning this part of Unicode brings up more confusion, and we might rather highlight instead that text may be any valid UTF-8 encoded Unicode.>>> How do these sequences compose, e.g., when building a name from its components?>> The document recommends to concatenate the components string values in order. That should work fine with properly beginning and ending formatting characters."Concatenate the components string values in order" works well for theinternal logical storage. But for display, the question is which order.LTR or RTL? The right answer may be a mixture. If you have severalpieces that are RTL, it's easier to read them in RTL order. Same forseveral pieces in LTR order. But if you have an RTL component at the endof the name part, and another RTL component at the start of the addresspart, and you display the whole thing inline, it's unclear whether thereader wants the name pieces and the address pieces clearly separated(leading to more 'jumps' of the reading sequence) or wants subsequentpieces that read the same way in the respective order independent ofwhether that visually interleaves e.g. name and address parts.We had very similar questions when discussing the display of bidi IRIs(the bidi solution in RFC 3987 is just one way of doing things, notnecessarily the preferred one for all users and all kinds of sequences).And for names/addresses, the addressee may also have a preference (i.e."I write my name with the components LTR" or "I write my name with thecomponents RTL". But not sure if mixed direction (e.g. mixed script)names are actually "a thing" in the relevant regions. If not, or "notreally", then it might be worth to put something into the spec torecommend avoidance of mixed-direction data (I don't remember thedetails, but it would be okay to have e.g. an English and a Hebrewlocale, but the English locale should be all Latin/LTR, and the Hebrewlocale should be all Hebrew (script)/RTL. In that case, neither DIRattributes nor bidi control characters would be needed.Regards, Martin.
-- last-call mailing list last-call@xxxxxxxx https://www.ietf.org/mailman/listinfo/last-call