Re: Using psql -f to load a UTF8 file

Roger Leigh <rleigh@xxxxxxxxxxxxx> · Fri, 21 Sep 2012 10:40:45 +0100

On Fri, Sep 21, 2012 at 09:21:36AM +0800, Craig Ringer wrote:
> On 09/20/2012 11:44 PM, Leif Biberg Kristensen wrote:
> >  Torsdag 20. september 2012 16.56.16 skrev Alan Millington :
> >>psql". But how am I supposed to remove the byte order mark from a UTF8
> >>file? I thought that the whole point of the byte order mark was to tell
> >>programs what the file encoding is. Other programs, such as Python, rely
> >>on this.
> >
> >http://en.wikipedia.org/wiki/Byte_order_mark
> >
> >While the Byte Order Mark is important for UTF-16, it's totally irrelevant to
> >the UTF-8 encoding.
> 
> I strongly disagree. The BOM provides a useful and standard way to
> differentiate UTF-8 encoded text files from the random pile of
> encodings that any given file could be.

Use of the BOM in UTF-8 causes a host of display and interoperability
problems, and is considered by many to be a broken practice.  It's
also pointless since there are no byte ordering issues with UTF-8.
Best to not use it at all.  In any case, the BOM byte sequence does
not unambiguously identify UTF-8; it's equally valid for 8-bit
charsets, so an external means of specifying the encoding is
preferable and more robust.

Regards,
Roger

-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux    http://people.debian.org/~rleigh/
 `. `'   schroot and sbuild  http://alioth.debian.org/projects/buildd-tools
   `-    GPG Public Key      F33D 281D 470A B443 6756 147C 07B3 C8BC 4083 E800

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general