On 20 Sep. 2017 03:29, "Julian Reschke" <julian.reschke@xxxxxx> wrote:
On 2017-09-19 19:17, John C Klensin wrote:
I agree that Notepad *could* be (heuristically) sniffing for UTF-8, and it would be interesting to hear why Microsoft doesn't do that.
Not *defaulting* to UTF-8 is not a bug. It may not be what our
preference is nowadays, but that's it.
See about. Slightly different discussion. But I note that it
isn't hard to distinguish between Latin-1 and UTF-8 without
relying on BOM -- the hard problem there involves distinguishing
between the various species of 8859 and assorted code pages.
...
Historically, because Windows uses/d UTF-16. See this decade old blog post, and particularly note the `dir > results.txt` snippet [1]
By the way, when it comes to Notepad's heuristics, create a text file that says "Bill fed the goats" (without the quotes), then save and open it. Unless IsTextUnicode has been updated recently, this should break the sniffer.
Cheers
--
Matthew Kerwin