Sim Zacks wrote: > We originally tested it on mysql and now we are migrating it > to postgresql. > > The messages are stored in a longblob field on mysql and a bytea field > in postgresql. > > I set the database up as UTF-8, even though we get emails that are not > UTF encoded, mostly because I didn't know what else to try that would > incorporate all the possible encodings. Examples of 3 encodings we > regularly receive are: UTF-8, Windows-1255, ISO-8859-8-I. [...] > It would not transfer through the dbi-link, so I wrote a python script > (see below) to read a row from mysql and write a row to postgresql > (using pygresql and mysqldb). > When I used pygresql's escape_bytea function to copy the data, it went > smoothly, but the data was corrupt. > When I tried the escape_string function it died because the data it was > moving was not UTF-8. > > I finally got it to work by defining a database as SQL-ASCII and then > using escape_string worked. After the data was all in place, I pg_dumped > and pg_restored into a UTF-8 database and it surprisingly works now. It's very dificult to know what exactly happened unless you have some examples of a byte sequence that illustrates what you describe: How it looked in MySQL, how it looked in your Python script, what you fed to escape_bytea. What client encoding did you use in your Python script? Yours, Laurenz Albe