pg_dump encoding problem

Jeff Davis <pgsql@xxxxxxxxxxx> · Thu, 19 Oct 2006 11:22:56 -0700

I am migrating a database from 7.4 in SQL_ASCII encoding to 8.1 in UTF8.
I made a pg_dump of the 7.4 database. I had difficulty (there are
invalid UTF8 characters in the original database, like 0xb9) going
straight into 8.1 with UTF8, so I tried importing it in a temporary 8.1
cluster that I set to be SQL_ASCII encoding. That import went fine.

So, basically, I am now trying to move data from 8.1 in SQL_ASCII to 8.1
in UTF8. I know that the text fields in UTF8 can handle the invalid
sequences because I can do:

=> create table foo(t text);
CREATE TABLE
=> insert into foo values(E'a\xb9c');
INSERT 0 1
=> insert into foo values('abc');
INSERT 0 1
=> select t,length(t) from foo;
  t  | length
-----+--------
 ac  |      3
 abc |      3

That's how I want to import the data. I want the application to behave
as much like before as possible, so I would not like to strip the binary
characters.

Is there a way to get pg_dump to use the escape sequences instead of
writing the binary value? Is what I'm trying to do dangerous?

I am still investigating how the application filters the data. If it
sends the binary character inside the query, is there any way to make a
UTF8-encoded database accept that? Do I have to create a separate
database encoded with SQL_ASCII?

Regards,
	Jeff Davis