[Last-Call] [Last-Call]: <draft-bormann-dispatch-modern-network-unicode-05> (Modern Network Unicode): W3C I18N Review

Addison Phillips <addisoni18n@xxxxxxxxx> · Mon, 10 Feb 2025 14:27:30 -0800

    All,
    The W3C Internationalization Working Group (of which I am chair)
      was requested to review several IETF documents nearing or in IETF
      Last Call.
    This email represents the issues our working group noticed in our
      review of:
    https://datatracker.ietf.org/doc/draft-bormann-dispatch-modern-network-unicode/
    ---
    The specific issues our group identified are tracked in github
      here:
    https://github.com/w3c/i18n-activity/issues?q=is%3Aissue%20state%3Aopen%20label%3As%3Amodern-network-unicode
    ---
    Here are the comments:
    #1972: Exclude other non-characters
    Section 2, point number 4:

      https://datatracker.ietf.org/doc/draft-bormann-dispatch-modern-network-unicode/

        The code points U+FFFE and U+FFFF MUST NOT be used. Also,
          Byte

          Order Marks (leading U+FEFF characters) MUST NOT be used.

    This should probably exclude non-character code points
      at the end of each supplementary plane (e.g. U+1FFFE, U+2FFFF,
      U+10FFFE, usw.)
    ---
    #1973: Relationship to CRLF line endings
    https://datatracker.ietf.org/doc/draft-bormann-dispatch-modern-network-unicode/
    Section 3 disallows CR in "2D MNU" (line-based Unicode
      text). Section 5 allows specs to define various variances that
      include CR and CRLF line feeds. Disallowing CRLF rather than
      supporting it adaptively seems like it would create a lot of
      uncertainty.
    ---
    #1974: "With NFKC" variant considered harmful
    Section 5.7 defines a "With NFKC" variant.
    This is probably a Bad Idea.
    NFKC is destructive and also might be incomplete in
      accomplishing something useful. Mentioning the K forms is probably
      fine, but by not defining this, one could stay away from the
      problems it produces. Note that W3C has this note in charmod-norm:

      Unicode compatibility decomposition removes meaning
        from the text that it is applied to. That means that this
        normalization step produces the most promiscuous matches. Some
        developers and specification authors find this level of
        normalization attractive because it appears to bring together
        many strings that are logically similar, but this level of
        normalization has limited utility in actual practice and has
        side effects that confuse users. This normalization step is
        presented for completeness, but it is not generally appropriate
        for use on the Web.

    ---
    #1975: Link and create harmony between this doc and
      W3C document "charmod-norm"
    W3C has a document whose short name (for historical
      reasons) is "charmod-norm" and whose title is "Character Model for
      the World Wide Web: String Matching". See:
      https://www.w3.org/TR/charmod-norm/. These documents have some
      similarity of content (there is also a similarity to PRECIS). It
      might be a good idea to cross-link this document and charmod-norm
      and ensure consistency when there is overlap.
    ---
    #1976: Missing 'character encoding form'?
    The Appendix A definition of terminology is a pretty
      good, but doesn't mention character encoding [form], which is the
      mapping from a code points in a character set to code units. This
      is actually the more commonly needed term.
    Note too the opportunity to harmonize with I18N
        Glossary
    ---
    #1977: Missing discussion of surrogates?
    There is a some discussion of surrogates in the
      appendices, but no mention of them in the body of the document,
      especially near the ABNF. It's probably a good idea to at least
      mention their exclusion somewhere in Section 6.
    ---
    #1978: Quirks in the history?
    There are a variety of places where one could take
      issue with the "history of Unicode" in Appendix B. I don't see any
      technical issues and don't really want to suggest any alterations,
      since this version of history conveys all of the important
      technical details and leaves out or alters some things that
      probably only matter to historians. Making this issue to note that
      we didn't ignore it.
    ---
    #1979: NFC and specifications
    Appendix C discusses Unicode normalization and the NFC
      form. The focus is on implementations, but there probably should
      be a mention of specifications (that is, I-Ds and other IETF
      technical documents) here (as with charmod-norm). It is primarily
      name/value matching that is affected by potential
      non-normalization. Specifications need to require (or forbid!) it
      in matching/uniqueness algorithms without requiring
      implementations to do Early Uniform Normalization on the wire.
    ---
    Thanks!
    Best regards (for W3C I18N),
    Addison

    -- 
Addison Phillips
Chair (W3C Internationalization WG)

Internationalization is not a feature.
It is an architecture.

-- 
last-call mailing list -- last-call@xxxxxxxx
To unsubscribe send an email to last-call-leave@xxxxxxxx