Re: encoding problem at restore

Michael Fuhr <mike@xxxxxxxx> · Sun, 18 Feb 2007 09:22:19 -0700

On Sat, Feb 17, 2007 at 03:12:44AM -0800, Bob Hunter wrote:
> ERROR:  invalid byte sequence for encoding "UTF8":
> 0xe02031
> HINT:  This error can also happen if the byte sequence
> does not match the encoding expected by the server,
> which is controlled by "client_encoding".
> CONTEXT:  COPY <tablename>, line 1270
> 
> There are two problems. The first is, why UTF8 at all,
> given that the dump specifies SQL_ASCII?

Probably because the database encoding is UTF-8.  You can check with
"SHOW server_encoding", or with \l in psql, or by running "psql -l"
from a shell prompt, etc.  With a client_encoding of SQL_ASCII no
conversion will be made, so if the data isn't already UTF-8 then you
get an error such as the above.

> The second is, that at line 1270 there are (unsurprisingly) only
> ASCII  characters, so why is psql complaining at all?

Are you sure you're looking at the right line?  The line number in
the error refers to the line of the COPY data, not to the line of
the input file or stream.  For example, if the COPY begins on line
67 of the dump file then line 1270 of the data would be line 1337
of the file.  If you look at the correct line you might find a
string like "à 1" (latin small letter a with grave, space, digit
one).

Try editing the client_encoding line to specify whatever encoding
the data is really in.  For Western European languages likely guesses
are LATIN1, LATIN9, or WIN1252 (especially the latter if the data
originated on Windows).  Alternatively, you could use a converter
like iconv or uconv to convert the file to UTF-8 before feeding
it to psql.

-- 
Michael Fuhr