Thank you for the link. I am using Notepad, which inserts the byte order mark. Following the links a bit further, I gather that the version of Notepad that I am using may not identify a UTF8 file correctly if the byte order mark is omitted. Also, as I mentioned, Python makes use of it. (From the Python documentation on Encoding declarations: "If the first bytes of the file are the UTF-8 byte-order mark ('\xef\xbb\xbf'), the declared file encoding is UTF-8 (this is supported, among others, by Microsoft’s Notepad).")
The conclusion seems to be that I must use one editor for Python, and another for Postgres.
From: Leif Biberg Kristensen <leif@xxxxxxxxxxxxxx>
To: Postgres general mailing list <pgsql-general@xxxxxxxxxxxxxx>
Cc: Alan Millington <admillington@xxxxxxxxxxx>
Sent: Thursday, 20 September 2012, 16:44
Subject: Re: [GENERAL] Using psql -f to load a UTF8 file
To: Postgres general mailing list <pgsql-general@xxxxxxxxxxxxxx>
Cc: Alan Millington <admillington@xxxxxxxxxxx>
Sent: Thursday, 20 September 2012, 16:44
Subject: Re: [GENERAL] Using psql -f to load a UTF8 file
Torsdag 20. september 2012 16.56.16 skrev Alan Millington :
> psql". But how am I supposed to remove the byte order mark from a UTF8
> file? I thought that the whole point of the byte order mark was to tell
> programs what the file encoding is. Other programs, such as Python, rely
> on this.
http://en.wikipedia.org/wiki/Byte_order_mark
While the Byte Order Mark is important for UTF-16, it's totally irrelevant to
the UTF-8 encoding. Still you'll find several editors that automatically input
BOMs in every text file. There is usually a setting "Insert Byte Order Mark"
somewhere in the configuration, and it may be on by default.
regards, Leif