Search Postgresql Archives

Re: How to remove non-UTF values from a table?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 15, 2009 at 9:26 PM, Howard Cole <howardnews@xxxxxxxxxxxxx> wrote:
> Phoenix Kiula wrote:
>>
>> An easy question for some I hope.
>>
>> I have a DB from 8.2 days that when I now dump and try to take into
>> the 8.3.7, it gives me errors about utf-8 stuff.
>>
>> I tried searching this list's archives but could not come up with an
>> answer.
>>
>> Google returns some sites like these:
>> http://sniptools.com/databases/finding-non-utf8-values-in-postgresql -
>> but I'm not clear on how to use them.
>>
>> Following the SQL on this site I could identify some columns that
>> contain text like this:
>>
>>    "Évolution générale de la situation démographique"
>>
>> So my guess is that the non-English characters were originally not
>> getting written in proper utf-8 variants.
>>
>> Is there any SQL possibility to find these columns and replace them
>> with utf-8 equivalents using some postgresql commands? Couldn't find
>> anything in the "Strings functions" (chapter 9 of manual).
>>
>> We're on CentOS.
>>
>> Thanks!
>>
>>
>
> My recommendation would be to install the iconv utility and run it on a
> plain text (pg_dump -Fp) backup as suggested in the google article - and
> then reimport the clean UTF-8.
>
> I am surprised that you managed to install the original backup on 8.3
> because it seems to be much more strict on encoding - Unless your database
> is not in UTF-8?



Thanks Howard.

I ran the SQL and it finds anything that has non-English characters.
For example:


http://www.amazon.co.jp/%E3%83%A4%E3%83%9E%E3%83%80%E9%9B%BB%E6%A9%9F%E3%81%AE%E5%93%81%E6%A0%BC%E2%80%95No-1%E4%BC%81%E6%A5%AD%E3%81%AE%E6%BF%80%E5%AE%89%E5%93%B2%E5%AD%A6-%E7%AB%8B%E7%9F%B3-%E6%B3%B0%E5%89%87/dp/406214378X/ref=sr_1_1?ie=UTF8&s=books&qid=1199212694&sr=8-1


Part of this URL is actually in Japanese, but when I paste it in this
email it comes up with all these percentage signs. I suppose this is
"url encoded".

Shouldn't this be valid UTF-8? How does PG calculate if something is
not valid UTF-8?

Thanks.

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux