On Monday 24 January 2011 8:06:38 am Geoffrey Myers wrote: > Adrian Klaver wrote: > > On Monday 24 January 2011 7:57:52 am Geoffrey Myers wrote: > >> Adrian Klaver wrote: > >>> On Monday 24 January 2011 6:38:55 am Geoffrey Myers wrote: > >>>> We need to change the database encoding on our databases as they were > >>>> created with the wrong encoding. They were created as SQL_ASCII and > >>>> we are changing them to UTF8. > >>>> > >>>> When testing this Friday, I received the following error: > >>>> > >>>> pg_restore: [archiver (db)] Error while PROCESSING TOC: > >>>> pg_restore: [archiver (db)] Error from TOC entry 5225; 0 16990 TABLE > >>>> DATA cust postgres > >>>> pg_restore: [archiver (db)] COPY failed: ERROR: invalid byte sequence > >>>> for encoding "UTF8": 0xb0 > >>>> HINT: This error can also happen if the byte sequence does not match > >>>> the encoding expected by the server, which is controlled by > >>>> "client_encoding". > >>>> CONTEXT: COPY cust, line 778 > >>> > >>> ^^^^^^^ In the COPY command for that table. > >> > >> I picked up ont that, but the dump is binary, thus I can not view the > >> actual code. > > > > Actually you can :) I should have mentioned it before. You can have > > pg_restore restore to a file instead of a database by using the -f > > switch. When you do that it creates plain text output. You could restore > > the entire dump to the file or use the -t switch to get only the table > > you need. > > Thanks for the suggestion. As it stands, we are getting different > errors for different hex characters, thus the solution we need is the > ability to identify the characters that won't convert from SQL_ASCII to > UTF8. Is there a resource that would identify these characters? > Well the issue is that SQL_ASCII is not an encoding. From the docs: http://www.postgresql.org/docs/9.0/interactive/multibyte.html#MULTIBYTE-CHARSET-SUPPORTED "Thus, this setting is not so much a declaration that a specific encoding is in use, as a declaration of ignorance about the encoding. In most cases, if you are working with any non-ASCII data, it is unwise to use the SQL_ASCII setting because PostgreSQL will be unable to help you by converting or validating non-ASCII characters. " What you need to do is determine what applications where putting data into the database and what encoding they are using. I ran into this a couple of years back with an app that was using WIN1252 for data being inserted into a couple of tables in a SQL_ASCII database . Once I knew the encoding I dumped the table schema only for those tables into a new UTF8 database. Using psql I set the client_encoding to WIN1252 and then used \i to pull in a plain text data only dump for each table. > > -- > Until later, Geoffrey > > "I predict future happiness for America if they can prevent > the government from wasting the labors of the people under > the pretense of taking care of them." > - Thomas Jefferson -- Adrian Klaver adrian.klaver@xxxxxxxxx -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general