On Fri, Sep 21, 2012 at 11:21 AM, Craig Ringer <ringerc@xxxxxxxxxxxxx> wrote: > I strongly disagree. The BOM provides a useful and standard way to > differentiate UTF-8 encoded text files from the random pile of encodings > that any given file could be. The only reliable way to ascertain the encoding of a hunk of data is with something out-of-band. Relying on the first three bytes being \xEF\xBB\xBF is not much more reliable than detecting based on octet frequency, which is what leads to the "Bush hid the facts" hack in Notepad. This is why many Internet protocols have metadata carried along with the file (eg Content-type in HTTP), rather than relying on internal evidence. > psql should accept UTF-8 with BOM. However, this I would agree with. It's cheap enough to detect, and aside from arbitrarily trying to kill Notepad (which won't happen anyway), there's not a lot of reason to choke on the BOM. But it's not a big deal. ChrisA -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general