[Last-Call] Re: [art] Re: Artart telechat review of draft-ietf-jmap-contacts-09

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Uh, is there a coherent explanation why RFC 8259 allows non-character
code points?  Or specifically why it allows surrogate code points?
"net-yet-assigned" code points I can see as plausibly allowing, but
surrogate code points will never be assigned characters.

(long version)

The question can be answered, not that I am interested in citing any of these.

1) There was originally no JSON parser in browsers. It quickly spread, because you could just use eval(). Obviously, there are security and conformance issues with that approach. So, now we have JSON.parse or equivalent everywhere, but this initial effort would have been around the year 2000. There were also earlier efforts that sometimes looked like JSON (Netscape Enterprise server, etc).

2) But, because it used _javascript_ and eval() originally, it used JS Strings. json.org had a better one pretty soon after, but it still used JS Strings.

3) So, why were JS Strings so awkward? It's because most GUI OS strings were UCS-2 (not even UTF-16).
https://simonsapin.github.io/wtf-8/#motivation

4) The first Linux distribution to switch to UTF-8 by default was in 2002:
"Red Hat Linux 8.0 (September 2002) was the first distribution to take the leap of switching to UTF-8 as the default encoding for most locales. The only exceptions were Chinese/Japanese/Korean locales, for which there were at the time still too many specialized tools available that did not yet support UTF-8"
https://www.cl.cam.ac.uk/~mgk25/unicode.html#linux

5) Then, you get to the nastier problem of escape sequences. Why have these? That's what lets you ship Unicode when not every system supports UTF-8. For example, Shift JIS is still used by 5.2% of sites in the .jp domain.
https://en.wikipedia.org/wiki/Shift_JIS

It's definitely better to use UTF-8 with no escape sequences if you're making something new, but sometimes the task is to consume content you don't control.

thanks,
Rob
 




-- 
last-call mailing list -- last-call@xxxxxxxx
To unsubscribe send an email to last-call-leave@xxxxxxxx

[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Mhonarc]     [Fedora Users]

  Powered by Linux