Search Postgresql Archives

Re: finding bogus UTF-8

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Glenn Maynard wrote:
On Thu, Feb 10, 2011 at 2:02 PM, Scott Ribe <scott_ribe@xxxxxxxxxxxxxxxx <mailto:scott_ribe@xxxxxxxxxxxxxxxx>> wrote:

    I know that I have at least one instance of a varchar that is not
    valid UTF-8, imported from a source with errors (AMA CPT files,
    actually) before PG's checking was as stringent as it is today. Can
    anybody suggest a query to find such values?


I hit this problem too, if I remember correctly when trying to upgrade a database from 8.3 to 8.4. I ended up aborting the upgrade, since the upgrade documentation made no mention of this and I didn't have time to dig into it at the time. A tool to find all instances of this would be very helpful.

I'm about to pipe the ascii output of a database dump through a perl script that removes any unwanted characters. To help define what 'unwanted characters' are, compare the ord() values to decimal values at http://www.asciitable.com/

while (<>)
{
    $_ =~ s/(.)/((ord($1) >= 0) && (ord($1) <= 8))
        || ((ord($1) >= 11) && (ord($1) <= 31))
        || ((ord($1) >= 127)) ?"": $1/egs;
    print;
}

comments would be appreciated.

--
Until later, Geoffrey

"I predict future happiness for America if they can prevent
the government from wasting the labors of the people under
the pretense of taking care of them."
- Thomas Jefferson

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux