On 9/18/17 17:57, Ted Lemon wrote:
On Sep 18, 2017, at 6:24 PM, Adam Roach <adam@xxxxxxxxxxx> wrote:
Unless you know something about NTFS, ext4, HFS, and exFAT that I don't, this sort of information isn't generally part of file metadata at all.
If you download a file in your web browser and save it to disk, the thing responsible for deciding whether or not to apply the BOM is the thing that did the download, not the server from which it was downloaded. The server already identified the file encoding type: utf8 (not text/utf8, sorry about that). If the thing that did the download does the wrong thing, that's not our problem.
I think we're talking at cross purposes here.
Today, as we speak, I have a copy of the RFC repository on my hard
drive. (To be precise, I have it on most of the hard drives of the
various machines that I use). For my current workflow, I *think* all of
them got there via rsync, although it's possible that some of them are
still using an old wget-based setup. It's kind of immaterial how they
got there, because a careful examination of them would show the same
result between the two methods (and any others I could think of,
including FTP mirroring and manually downloading via web browsers): it's
a sequence of bytes, with a ".txt" file extension; identical, regardless
of which tool downloaded them. There is nothing else about the file to
indicate its encoding.[1]
Okay. So, now, I open up the local file browser to that file on my hard
drive, and double-click on an RFC. An application is launched. Let's say
that application is Wordpad. How does it know which character encoding
to use for this file?
/a
____
[1] If this is one of the Macs, and the download tool were really
Mac-centric, it might have included a resource fork with some additional
metadata, but (AFIAK), even the resource fork does not include character
encoding. Other operating systems have similar constructs, but I'm less
familiar with them.