Search Postgresql Archives

Re: Best practices for moving UTF8 databases

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Phoenix Kiula wrote:
> > I wonder: why do you spend so much time complaining instead of
> > simply locating the buggy data and fixing them?
> 
> 
> I'd love to fix them. But if I do a search for
> 
> SELECT * FROM xyz WHERE col like '%0x80%'
> 
> it doesn't work. How should I search for these characters?

I would get GNU libiconv:
http://www.gnu.org/software/libiconv/

Then take a plain text pg_dump of the database in UTF-8.
Let's call it "db.orig.dmp".

Strip all problem characters:
iconv -f UTF-8 -t UTF-8 -c <db.orig.dmp >db.stripped.sql

Compare both files:
diff -u db.orig.dmp db.stripped.sql

That output will show all lines containing a problem character.

Now the tedious part:
Use "db.orig.dmp" to find out which tables they belong to,
locate the records in the database by primary key and fix them.

Alternatively, you can use iconv's auto-repair if you know which
bytes give you a problem.
For example, if you know that the trouble stems only from 0x80 bytes
that should be Euro symbols, you could:
iconv -f UTF-8 -t UTF-8 --byte-subst="<0x%x>" <db.orig.dmp | sed -e 's/<0x80>/EUR/g' >db.fixed.sql

The resulting "db.fixed.sql" could then be loaded into the new database.

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux