Re: Should the IETF be condoning, even promoting, BOM pollution?

Adam Roach <adam@xxxxxxxxxxx> · Mon, 18 Sep 2017 18:16:10 -0500

On 9/18/17 17:57, Ted Lemon wrote:
On Sep 18, 2017, at 6:24 PM, Adam Roach <adam@xxxxxxxxxxx> wrote:
Unless you know something about NTFS, ext4, HFS, and exFAT that I don't, this sort of information isn't generally part of file metadata at all.
If you download a file in your web browser and save it to disk, the thing responsible for deciding whether or not to apply the BOM is the thing that did the download, not the server from which it was downloaded.   The server already identified the file encoding type: utf8 (not text/utf8, sorry about that).   If the thing that did the download does the wrong thing, that's not our problem.

I think we're talking at cross purposes here.

Today, as we speak, I have a copy of the RFC repository on my hard 
drive. (To be precise, I have it on most of the hard drives of the 
various machines that I use). For my current workflow, I *think* all of 
them got there via rsync, although it's possible that some of them are 
still using an old wget-based setup. It's kind of immaterial how they 
got there, because a careful examination of them would show the same 
result between the two methods (and any others I could think of, 
including FTP mirroring and manually downloading via web browsers): it's 
a sequence of bytes, with a ".txt" file extension; identical, regardless 
of which tool downloaded them. There is nothing else about the file to 
indicate its encoding.[1]

Okay. So, now, I open up the local file browser to that file on my hard 
drive, and double-click on an RFC. An application is launched. Let's say 
that application is Wordpad. How does it know which character encoding 
to use for this file?

/a

____
[1] If this is one of the Macs, and the download tool were really 
Mac-centric, it might have included a resource fork with some additional 
metadata, but (AFIAK), even the resource fork does not include character 
encoding. Other operating systems have similar constructs, but I'm less 
familiar with them.