Glenn Maynard wrote:
On Thu, Feb 10, 2011 at 2:02 PM, Scott Ribe <scott_ribe@xxxxxxxxxxxxxxxx
<mailto:scott_ribe@xxxxxxxxxxxxxxxx>> wrote:
I know that I have at least one instance of a varchar that is not
valid UTF-8, imported from a source with errors (AMA CPT files,
actually) before PG's checking was as stringent as it is today. Can
anybody suggest a query to find such values?
I hit this problem too, if I remember correctly when trying to upgrade a
database from 8.3 to 8.4. I ended up aborting the upgrade, since the
upgrade documentation made no mention of this and I didn't have time to
dig into it at the time. A tool to find all instances of this would be
very helpful.
I'm about to pipe the ascii output of a database dump through a perl
script that removes any unwanted characters. To help define what
'unwanted characters' are, compare the ord() values to decimal values at
http://www.asciitable.com/
while (<>)
{
$_ =~ s/(.)/((ord($1) >= 0) && (ord($1) <= 8))
|| ((ord($1) >= 11) && (ord($1) <= 31))
|| ((ord($1) >= 127)) ?"": $1/egs;
print;
}
comments would be appreciated.
--
Until later, Geoffrey
"I predict future happiness for America if they can prevent
the government from wasting the labors of the people under
the pretense of taking care of them."
- Thomas Jefferson
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general