Re: Should the IETF be condoning, even promoting, BOM pollution?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 20 Sep. 2017 03:29, "Julian Reschke" <julian.reschke@xxxxxx> wrote:
On 2017-09-19 19:17, John C Klensin wrote:


--On Tuesday, September 19, 2017 7:05 PM +0200 Julian Reschke
<julian.reschke@xxxxxx> wrote:

Not *defaulting* to UTF-8 is not a bug. It may not be what our
preference is nowadays, but that's it.

See about.  Slightly different discussion.   But I note that it
isn't hard to distinguish between Latin-1 and UTF-8 without
relying on BOM -- the hard problem there involves distinguishing
between the various species of 8859 and assorted code pages.
...

I agree that Notepad *could* be (heuristically) sniffing for UTF-8, and it would be interesting to hear why Microsoft doesn't do that.


Historically, because Windows uses/d UTF-16. See this decade old blog post, and particularly note the `dir > results.txt` snippet [1] 

By the way, when it comes to Notepad's heuristics, create a text file that says "Bill fed the goats" (without the quotes), then save and open it. Unless IsTextUnicode has been updated recently, this should break the sniffer.


Cheers
--
Matthew Kerwin 

[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Fedora Users]