The doublequotes isn't UTF8 it's people copying and pasting from Microsoft stuff, which is WIN-1252. So try to use that with iconv instead of utf8 On 2010-08-16 12:40:03PM -0500, Karl Denninger wrote: > So I have myself a nice pickle here..... > > I've got a database which was originally created with SQL_ASCII for the > encoding (anything goes text fields) > > Unfortunately, I have a bunch of data that was encoded in UTF-8 that's > in an RSS feed that I need to load into said database. iconv barfs all > over this file in an attempt to turn it into ISO-8859 (which is what the > web application currently attached to that database is emitting and > collecting.) It appears the problem is (mostly) things like the > stylized double-quotes. > > So I figured I'd go the other way, and convert what I have now in the > tables into UTF8. > > Well, except that doesn't work either. > > ticker=# select convert_to(subject, 'utf8') from post where > ordinal='2098167'; > convert_to > ---------------------------------------------------------------- > 1%: the interest rate on IBM\222s most recent three-year bond. > (1 row) > > \222 is the correct code point for the styled single apostrophe that is > in that place in ISO-8859-1 in the source. However, the UTF prefix is > missing, as are the other two code-point characters (that is, I got the > code point but not the other two bytes that should be in front of it. > And if I set the code page on the web site to UTF-8, and also set the > encoding on the SQL session to UTF-8 I don't get the three-byte code - I > just get the one byte. That's a bust. > > There are TWO fields in this database that need converted. I presumed I > could do something like this: > > # update post set field1 = convert_to(field1, 'utf8'); > > It runs to completion without complaint but produces the above. No good. > > So.... is there a way to do this? I do NOT want to dump, iconv on the > dumped file, then reload the database if I can possibly avoid it. Yes, > I know that will work (I've tested it on my development box), but being > able to do this "hot" would be DRAMATICALLY preferred. If I do the > iconv route I am basically rebuilding the entire database with all > that's involved in doing so in terms of downtime and such. VERY > undesirable. > > (Equally-valid would be a way to convert UTF-8 to ISO-8859-1 (Latin1), > thereby allowing me to convert the incoming data stream to what's > already in the system, but thus far I've found no joy on that at all.) > > Ideas? > > -- Karl > > begin:vcard > fn:Karl Denninger > n:Denninger;Karl > email;internet:karl@xxxxxxxxxxxxx > x-mozilla-html:TRUE > version:2.1 > end:vcard > > > -- > Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general -- =========================================================== Peter C. Lai | Bard College at Simon's Rock Systems Administrator | 84 Alford Rd. Information Technology Svcs. | Gt. Barrington, MA 01230 USA peter AT simons-rock.edu | (413) 528-7428 =========================================================== -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general