On 04/22/2011 08:00 AM, Geoffrey Myers wrote:
We are moving our databases to new hardware soon, so we felt it would
be a good time to get the encoding correct. Our databases are
currently SQL_ASCII and we plan to move them to UTF8.
We are in the same boat, fortunately only on one older server we are
upgrading and fortunately for internal apps....
So, as previously noted, there are certain characters that won't load
into a UTF8 database from a dump of the SQL_ASCII database.
Here's our problem. We planned on moving databases a few at a time.
Problem is, there is a process that pushes data from one database to
another. If this process attempts to push data from a SQL_ASCII
database to a new UTF8 database and it has one of these characters
mentioned above, the process fails.
So, now the question is, is this effort even worth our effort?
What is the harm in leaving our databases SQL_ASCII encoded?
SQL_ASCII is a synonym for "no encoding." You put in a stream of bytes
and that's what you get out. That's OK if the byte-stream has exactly
the same meaning to every application and user. If that's not the case
then you have bytes in your database but you don't know what those bytes
are supposed to represent.
In a way, it's like having a generic integer column but depending on the
user or the application, that column might represent a unix epoch
timestamp, an age in years, a salary, a weight in grams, furlongs per
fortnight, etc. And there is no indicator to say which it is. Not good.
We are in the final stages of cleaning up our last bit of non-utf8 data
and the above some what silly example is actually not far from the
truth. Due to data that arrived from web-inputs, spreadsheet imports,
command-line, internal-apps, etc. we have been faced with cleaning
tables where one row has only basic ASCII data, another has UTF8, while
others have various different Microsoft encodings. With data like that
it's pretty much impossible to guarantee that even a simple web-report
will display fully correctly.
The longer you wait, the worse it gets. Even though it is only a tiny
percentage of our data, cleaning it is still a pain.
Is it worth it? Dunno - you have to calculate the cost/benefit. For us
it was a no-brainer to bite-the-bullet and do it.
Cheers,
Steve
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general