Re: SQL-ASCII database cleanup

Susan Cassidy <scassidy@xxxxxxxxxxxx> · Thu, 21 Jul 2011 08:49:24 -0700

Use the Encode module to test/convert back
and forth between UTF8 characters and bytes for the SQL ASCII database. 
Assuming the input is already UTF-8:

use Encode qw(:all);

# connect to db, prepare insert statement,
etc.

  my $bytes = encode('utf8', $utf8_text);

  $sth->execute($bytes, $i) or
errexit("execute of insert into public_suffixes tbl failed: ",
$DBI::errstr);

If your input is not already UTF-8, you
will have to use decode in an eval statement to convert to utf-8, then check
for failure before re-converting and inserting into the database.  Or something
similar.

This seems to work for me.  When I need to
pull the data back out of the database, I have to reconvert from the byte
string into UTF-8 characters before displaying the output.

Susan

From:
pgsql-general-owner@xxxxxxxxxxxxxx [mailto:pgsql-general-owner@xxxxxxxxxxxxxx] On Behalf Of Mike Blackwell

Sent: Thursday, July 21, 2011 7:49
AM

To: pgsql-general@xxxxxxxxxxxxxx

Subject:  SQL-ASCII
database cleanup

I have an older database that was created with SQL-ASCII encoding.
 Over time users have managed to enter all manner of interesting
characters, mostly via cut and paste from Windows documents.  I'm
attempting to clean up and eventually the database to UTF8.  I've managed
to find most of the data that won't nicely convert from some-random-encoding to
UTF8, but it seems the users are entering it as fast as I can find it. Is there
a way the incoming data from a Perl CGI web application can be automatically
limited to UTF8 even though the database is SQL-ASCII?

Mike