[Last-Call] [Last-Call]: <draft-bormann-dispatch-modern-network-unicode-05> (Modern Network Unicode): W3C I18N Review

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



All,

The W3C Internationalization Working Group (of which I am chair) was requested to review several IETF documents nearing or in IETF Last Call.

This email represents the issues our working group noticed in our review of:

https://datatracker.ietf.org/doc/draft-bormann-dispatch-modern-network-unicode/

---

The specific issues our group identified are tracked in github here:

https://github.com/w3c/i18n-activity/issues?q=is%3Aissue%20state%3Aopen%20label%3As%3Amodern-network-unicode

---

Here are the comments:

#1972: Exclude other non-characters

Section 2, point number 4:
https://datatracker.ietf.org/doc/draft-bormann-dispatch-modern-network-unicode/

  1. The code points U+FFFE and U+FFFF MUST NOT be used. Also, Byte
    Order Marks (leading U+FEFF characters) MUST NOT be used.

This should probably exclude non-character code points at the end of each supplementary plane (e.g. U+1FFFE, U+2FFFF, U+10FFFE, usw.)

---

#1973: Relationship to CRLF line endings

https://datatracker.ietf.org/doc/draft-bormann-dispatch-modern-network-unicode/

Section 3 disallows CR in "2D MNU" (line-based Unicode text). Section 5 allows specs to define various variances that include CR and CRLF line feeds. Disallowing CRLF rather than supporting it adaptively seems like it would create a lot of uncertainty.

---

#1974: "With NFKC" variant considered harmful

Section 5.7 defines a "With NFKC" variant.

This is probably a Bad Idea.

NFKC is destructive and also might be incomplete in accomplishing something useful. Mentioning the K forms is probably fine, but by not defining this, one could stay away from the problems it produces. Note that W3C has this note in charmod-norm:

Unicode compatibility decomposition removes meaning from the text that it is applied to. That means that this normalization step produces the most promiscuous matches. Some developers and specification authors find this level of normalization attractive because it appears to bring together many strings that are logically similar, but this level of normalization has limited utility in actual practice and has side effects that confuse users. This normalization step is presented for completeness, but it is not generally appropriate for use on the Web.

---

#1975: Link and create harmony between this doc and W3C document "charmod-norm"

W3C has a document whose short name (for historical reasons) is "charmod-norm" and whose title is "Character Model for the World Wide Web: String Matching". See: https://www.w3.org/TR/charmod-norm/. These documents have some similarity of content (there is also a similarity to PRECIS). It might be a good idea to cross-link this document and charmod-norm and ensure consistency when there is overlap.

---

#1976: Missing 'character encoding form'?

The Appendix A definition of terminology is a pretty good, but doesn't mention character encoding [form], which is the mapping from a code points in a character set to code units. This is actually the more commonly needed term.

Note too the opportunity to harmonize with I18N Glossary

---

#1977: Missing discussion of surrogates?

There is a some discussion of surrogates in the appendices, but no mention of them in the body of the document, especially near the ABNF. It's probably a good idea to at least mention their exclusion somewhere in Section 6.

---

#1978: Quirks in the history?

There are a variety of places where one could take issue with the "history of Unicode" in Appendix B. I don't see any technical issues and don't really want to suggest any alterations, since this version of history conveys all of the important technical details and leaves out or alters some things that probably only matter to historians. Making this issue to note that we didn't ignore it.

---

#1979: NFC and specifications

Appendix C discusses Unicode normalization and the NFC form. The focus is on implementations, but there probably should be a mention of specifications (that is, I-Ds and other IETF technical documents) here (as with charmod-norm). It is primarily name/value matching that is affected by potential non-normalization. Specifications need to require (or forbid!) it in matching/uniqueness algorithms without requiring implementations to do Early Uniform Normalization on the wire.

---

Thanks!

Best regards (for W3C I18N),

Addison

-- 
Addison Phillips
Chair (W3C Internationalization WG)

Internationalization is not a feature.
It is an architecture.
-- 
last-call mailing list -- last-call@xxxxxxxx
To unsubscribe send an email to last-call-leave@xxxxxxxx

[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Mhonarc]     [Fedora Users]

  Powered by Linux