Re: Need magic that checks if a collumn contains values that CAN NOT be converted to Latin1

Craig Ringer <craig@xxxxxxxxxxxxxxxxxxxxx> · Thu, 29 Apr 2010 12:35:55 +0800

On 29/04/10 11:02, Andreas wrote:
Hi,

I've got an 8.4.3 Unicode DB that accidentally holds a few records with
characters that can't be converted to Latin1 or 9 for output to CSV.

I'd just need a way to check if a collumn contains values that CAN NOT
be converted from Utf8 to Latin1 to select all those affected records.

I tried:
Select convert_to (my_column::text, 'LATIN1') from my_table;

It raises an error that says translated:
ERROR: character 0xe28093 in encoding »UTF8« has no equivalent in »LATIN1«
Regrettably it doesn't explain where it found this sign.

I'd use a PL/PgSQL procedure to step through the result of a FOR IN 
SELECT, running each test in an exception handling block.

Select '\xe28093'
complains that this weren't a valid UTF8 code at all.
So how was it accepted and stored in the first place?

Because that's the escape E'\xe2" followed by the literal characters 
"8093".  "\xe2" followed by those characters isn't legal utf-8.

I'm pretty sure the error actually refers to:

  select E'\xe2\x80\x93';

which is the character "–" (U+2013 EN DASH).

Yes, it'd be nice if PostgreSQL's error message was in syntax that 
PostgreSQL understood, not pseudo-C-style hex literal form.

--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general