On 12/08/11 7:54 PM, Bruce Clay wrote:
Is there a "proper" encoding type that I should use to load the word lists so they can be interoperable with the WordNet dataset that happily uses the UTF8 encoding?
some of your input data may be in other encodings, not UTF8, for instance, LATIIN1. if you can identify these, and use SET CLIENT_ENCODING=... at the appropriate times, you should be able to import from the various data sources.
otherwise, you might have to run the data through some sort of filter before you feed it to postgres, I dunno. I'm pretty sure 0x82 is not a valid code in UTF8.
-- john r pierce N 37, W 122 santa cruz ca mid-left coast -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general