Re: Best practices for moving UTF8 databases

Phoenix Kiula <phoenix.kiula@xxxxxxxxx> · Sun, 19 Jul 2009 10:16:17 +0800

On Tue, Jul 14, 2009 at 9:52 PM, Alvaro
Herrera<alvherre@xxxxxxxxxxxxxxxxx> wrote:
> Andres Freund wrote:
>> On Tuesday 14 July 2009 11:36:57 Jasen Betts wrote:
>
>> > if you do an ascii dump and the dump starts out "SET CLIENT ENCODING
>> > 'UTF8'" or similar but you still get errors.
>> Do you mean that a dump from SQL_ASCII can yield non-utf8 data? right. But
>> According to the OP his 8.3 database is UTF8...
>> So there should not be invalid data in there.
>
> I haven't followed this thread, but older PG versions had less strict
> checks on UTF8 data, which meant that some invalid data could creep in.

If so, how can I check for them in my old database, which is 8.2.9?
I'm now moving first to 8.3 (then to the 84).

Really, PG absolutely needs a way to upgrade the database without so
much data related downtime and all these silly woes. Several competing
database systems are a cinch to upgrade.

Anyway this is the annoying error I see as always:

  ERROR:  invalid byte sequence for encoding "UTF8": 0x80

I think my old DB is all utf8. If there are a few characters that are
not, how can I work with this? I've done everything I can to take care
of the encoding and such. This code was used to initdb:

 initdb --locale=en_US.UTF-8 --encoding=UTF8

Locale environment variables are all "en_US.UTF-8" too.

Thanks for any pointers!

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general