Re: Using psql -f to load a UTF8 file

Chris Angelico <rosuav@xxxxxxxxx> · Fri, 21 Sep 2012 13:07:11 +1000

On Fri, Sep 21, 2012 at 11:21 AM, Craig Ringer <ringerc@xxxxxxxxxxxxx> wrote:
> I strongly disagree. The BOM provides a useful and standard way to
> differentiate UTF-8 encoded text files from the random pile of encodings
> that any given file could be.

The only reliable way to ascertain the encoding of a hunk of data is
with something out-of-band. Relying on the first three bytes being
\xEF\xBB\xBF is not much more reliable than detecting based on octet
frequency, which is what leads to the "Bush hid the facts" hack in
Notepad. This is why many Internet protocols have metadata carried
along with the file (eg Content-type in HTTP), rather than relying on
internal evidence.

> psql should accept UTF-8 with BOM.

However, this I would agree with. It's cheap enough to detect, and
aside from arbitrarily trying to kill Notepad (which won't happen
anyway), there's not a lot of reason to choke on the BOM. But it's not
a big deal.

ChrisA

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general