BOM-free tests [was Should the IETF be condoning, even promoting, BOM pollution?]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 19/09/2017 11:16, Adam Roach wrote:
...

> Okay. So, now, I open up the local file browser to that file on my hard 
> drive, and double-click on an RFC. An application is launched. Let's say 
> that application is Wordpad. How does it know which character encoding 
> to use for this file?

So, I made a version of rfc8187.txt without a BOM. Here's what I see
on Windows 7. (FYI, there are no fails when the BOM is present.)

Notepad: fail. All the non-ASCII characters display wrongly.

Wordpad: fail. (I have an ancient Wordpad and a more modern one.
They both fail.)

Libreoffice: fail. I understand that Libreoffice generates a BOM
when writing plain text files.

MS Word: asks me to select an encoding. When I select UTF-8, it works OK.

Firefox: Works OK if View/Text Encoding is set to Unicode

I couldn't find any options in the first three products to get round this.

It would be interesting to hear similar facts from people using other systems.
(My Python3 code for removing the BOM is below, or it seems there is
a utility called iconv that can do it.)

    Brian

rnum=input('RFC number:')
file1 = open("rfc"+rnum+".txt", "rb")
chunk=file1.read(1024)
if chunk[0:3] == b'\xef\xbb\xbf':
    chunk=chunk[3:]
    file2=open("rfc"+rnum+"X.txt", "wb")
    while chunk:
        file2.write(chunk)
        chunk=file1.read(1024)
    file2.close()
    print("Rewritten without BOM as","rfc"+rnum+"X.txt")
else:
    print("No BOM found")
file1.close()





[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Fedora Users]