On 2/27/15, Adam Hooper <adam@xxxxxxxxxxxxxx> wrote: > On Thu, Feb 26, 2015 at 9:50 PM, Melvin Call <melvincall979@xxxxxxxxx> > wrote: > >> So my question is, how do I sanitize the hex character in the middle of a >> word >> to be able to copy in Montreal with an accented e? Or am I going about >> this at >> the wrong point? > > Hi Melvin, > > This is not a Postgres problem, and it is not a regex problem. So yes, > you're going about it at the wrong point: you're trying to modify a > _character_ at a time, but you _should_ be trying to modify a _byte_ > at a time. Text replacement cannot do what you want it to do. > > If you're on Linux or Mac, uconv will work -- for instance, `iconv > --from-code=windows-1252 --to-code=utf-8 < input-file.txt > > output-file.txt` > > Otherwise, you can use a text editor. Be sure to open the file > properly (such that é appears) and then save it as utf-8. > > Alternatively, you could tell Postgres to use your existing encoding > -- judging from the \xe9, any of "windows-1252", "iso-8859-15" or > "iso-8859-1" will be accurate. But I always prefer my data to be > stored as "utf-8", and you should, too. > > Read up on character sets here: > http://www.joelonsoftware.com/articles/Unicode.html > > Enjoy life, > Adam Thank you Adam. I was able to make this work by adding the ENCODING 'latin1' option to the COPY command per Vic's suggestion, and as you correctly pointed out as well. However iconv would probably do the trick too, now that I know where the problem actually lies. I failed to realize that I was not dealing with UTF8 because the MySQL data is encoded in UTF8, but you saw what I wasn't seeing. Your suggested reading is also most appreciated. Maybe one of these days I will actually make sense of this encoding issue. Thanks for the link. Regards, Melvin -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general