> There are no lead bytes in UTF-8 Sorry, sloppy use of terminology. I should have said "UTF signatures" aka the "byte order mark". IOW, the "magic number" bytes commonly found at the front of UTF encoded files: UTF-16 little-endian FF FE UTF-16 big-endian FE FF UTF-8 EF BB BF These tend to be inserted automatically by text editors, so it would be advantageous to have them handled automatically by COPY (at least as an option). Right now, if I edit a UTF-8 file then load it with COPY, I get errors or bad data if the editor chose to add the 3 signature bytes. Whilst UTF-16 is not supported internally, COPY seems to be a legitimate special case, because it is used for migration to/from other tools that may emit or expect UTF-16. ISTR that Postgres uses UCI? If so it would be near-trivial to allow COPY to read and write UTF-16. If done via a syntax extension to COPY (which I think is the most desirable implementation), this would have no adverse effect on any other capability. It also seems sufficiently isolated from sensitive/complex areas of the code that it might make a suitable first project for someone who is interested in becoming a contributor... -- Peter Headland Architect Actuate Corporation -----Original Message----- From: Tom Lane [mailto:tgl@xxxxxxxxxxxxx] Sent: Thursday, September 10, 2009 11:13 To: Peter Headland Cc: pgsql-general@xxxxxxxxxxxxxx Subject: Re: COPY command character set "Peter Headland" <pheadland@xxxxxxxxxxx> writes: > How about my suggestion to add a means (extend COPY syntax) to specify > encoding explicitly and handle UTF lead bytes - would that be of > interest? There are no lead bytes in UTF-8, and we make no pretense of handling UTF-16, so I don't think we'd be interested in some hack that cleans up misencoding problems. The idea of overriding client_encoding has been suggested before. I don't remember if it was rejected or is just languishing on the TODO list. I'd be a little worried about sending clients data in an encoding they aren't expecting, but if it only works for I/O to a file it might be okay. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general