Steffen Nurpmeso <steffen@xxxxxxxxxx> writes: I don't know what the larger problems might be with draft-ietf-jmap-contacts-09, but I think there is less trouble with this particular point than first appears: > I myself wonder whether that innocent RFC 9553 sentence > > any valid sequence of Unicode characters encoded as a JSON string > > excludes surrogates? It definitely does, because within the Unicode lexicon, a "surrogate" is a code point, but not a code point that is assigned to a "character". Thus surrogates are not "characters" and cannot be members of a "valid sequence of Unicode characters". I haven't found a really definite statement of this, but that is clear from both https://en.wikipedia.org/wiki/Unicode#Architecture_and_terminology and https://www.unicode.org/versions/Unicode15.1.0/ch02.pdf Note that Unicode's "character" can be a bit messy. E.g. "lower case a with umlaut" can be either a single "character" U+00E4 or two "characters", U+0061 followed by the combining dieresis U+0308. Or for a particularly hairy ligature in one of the Brahmic scripts, see figure 2-3 in the Unicode document I linked to above, which combines no less than 6 "characters" into one rendered glyph. > It should, but it then actively changes the > meaning of "JSON string" to be a dedicated "sub-profile" of what > "JSON string" normally means, and then to me the sentence is not > clear enough. In principle, you don't need to *define* a profile (sub-specification) of JSON to say e.g. "the thing must be a JSON string encoding of a sequence of ASCII letters", though of course in that case the set of "things" *will be* only a subset of JSON string encodings. But in this case, looking at RFC 4627 sec. 2.5, "Strings", it's clear (though not directly stated) that a JSON string representation will be a sequence of ASCII characters that represent a sequence of Unicode characters. So the limitation in this draft to "Unicode characters" matches what the definition of JSON allows, and as such there is no subsetting. > This seems not to mean entire grapheme clusters. And this seems > to mean to me that the above RFC 9553 meaning is massively > under-defined, because there are invisible/visible modifiers, > combining characters and more, most or all all of which will fail > a simple "isprint" by themselves, so RFC 9553's > > Implementations MUST NOT assume that text values of adjacent > properties are processed or displayed as a combined string; for > example, the values of a given name component and a surname > component may or may not be rendered together. > > combined with > > any valid sequence of Unicode characters encoded as a JSON string > > does not make sense at all. I think that's incorrect because there's no requirement that a Unicode character passes an "isprint" test. And the Unicode "general category" attribute for characters/code points has values like "other, control" and "other, format" that are specified as "characters" but they're not "printable" in the ordinary sense. See https://en.wikipedia.org/wiki/Unicode#General_Category_property Dale -- last-call mailing list -- last-call@xxxxxxxx To unsubscribe send an email to last-call-leave@xxxxxxxx