Re: BOM-free tests [was Should the IETF be condoning, even promoting, BOM pollution?]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



----- Original Message -----
From: "Brian E Carpenter" <brian.e.carpenter@xxxxxxxxx>
To: "IETF" <ietf@xxxxxxxx>
Sent: Tuesday, September 19, 2017 2:24 AM

> On 19/09/2017 11:16, Adam Roach wrote:
> ...
>
> > Okay. So, now, I open up the local file browser to that file on my
hard
> > drive, and double-click on an RFC. An application is launched. Let's
say
> > that application is Wordpad. How does it know which character
encoding
> > to use for this file?
>
> So, I made a version of rfc8187.txt without a BOM. Here's what I see
> on Windows 7. (FYI, there are no fails when the BOM is present.)
>
> Notepad: fail. All the non-ASCII characters display wrongly.

agree but my (elderly?) Notepad with BOM present displays the BOM as

i double dot   >>  inverted question mark

no surpise there.

Wordpro and Word Viewer older versions likewise display the BOM when it
is present.

newer Word Viewer with no BOM asks me what encoding to use and displays
ok

> Wordpad: fail. (I have an ancient Wordpad and a more modern one.
> They both fail.)

agree but I note than when I save a file in UTF using Wordpad it uses
UTF16 not UTF8 AFAICT.

Tom Petch

> Libreoffice: fail. I understand that Libreoffice generates a BOM
> when writing plain text files.
>
> MS Word: asks me to select an encoding. When I select UTF-8, it works
OK.
>
> Firefox: Works OK if View/Text Encoding is set to Unicode
>
> I couldn't find any options in the first three products to get round
this.
>
> It would be interesting to hear similar facts from people using other
systems.
> (My Python3 code for removing the BOM is below, or it seems there is
> a utility called iconv that can do it.)
>
>     Brian
>
> rnum=input('RFC number:')
> file1 = open("rfc"+rnum+".txt", "rb")
> chunk=file1.read(1024)
> if chunk[0:3] == b'\xef\xbb\xbf':
>     chunk=chunk[3:]
>     file2=open("rfc"+rnum+"X.txt", "wb")
>     while chunk:
>         file2.write(chunk)
>         chunk=file1.read(1024)
>     file2.close()
>     print("Rewritten without BOM as","rfc"+rnum+"X.txt")
> else:
>     print("No BOM found")
> file1.close()
>
>




[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Fedora Users]