Search Postgresql Archives

Encoding change question...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



So I have myself a nice pickle here.....

I've got a database which was originally created with SQL_ASCII for the
encoding (anything goes text fields)

Unfortunately, I have a bunch of data that was encoded in UTF-8 that's
in an RSS feed that I need to load into said database.  iconv barfs all
over this file in an attempt to turn it into ISO-8859 (which is what the
web application currently attached to that database is emitting and
collecting.)  It appears the problem is (mostly) things like the
stylized double-quotes.

So I figured I'd go the other way, and convert what I have now in the
tables into UTF8.

Well, except that doesn't work either.

ticker=# select convert_to(subject, 'utf8') from post where
ordinal='2098167';
                           convert_to                          
----------------------------------------------------------------
 1%: the interest rate on IBM\222s most recent three-year bond.
(1 row)

\222 is the correct code point for the styled single apostrophe that is
in that place in ISO-8859-1 in the source.  However, the UTF prefix is
missing, as are the other two code-point characters (that is, I got the
code point but not the other two bytes that should be in front of it. 
And if I set the code page on the web site to UTF-8, and also set the
encoding on the SQL session to UTF-8 I don't get the three-byte code - I
just get the one byte.  That's a bust.

There are TWO fields in this database that need converted.  I presumed I
could do something like this:

# update post set field1 = convert_to(field1, 'utf8');

It runs to completion without complaint but produces the above.  No good.

So.... is there a way to do this?  I do NOT want to dump, iconv on the
dumped file, then reload the database if I can possibly avoid it.  Yes,
I know that will work (I've tested it on my development box), but being
able to do this "hot" would be DRAMATICALLY preferred.  If I do the
iconv route I am basically rebuilding the entire database with all
that's involved in doing so in terms of downtime and such.  VERY
undesirable.

(Equally-valid would be a way to convert UTF-8 to ISO-8859-1 (Latin1),
thereby allowing me to convert the incoming data stream to what's
already in the system, but thus far I've found no joy on that at all.)

Ideas?

-- Karl

begin:vcard
fn:Karl Denninger
n:Denninger;Karl
email;internet:karl@xxxxxxxxxxxxx
x-mozilla-html:TRUE
version:2.1
end:vcard

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux