On Mon, Jan 24, 2011 at 12:16:46PM -0500, Geoffrey Myers wrote: > We hope to identify the characters and fix them in the existing > database, then convert. It appears to be very limited, but it would > help if there was some way to identify these characters outside of > simply doing the reload of the data and finding the errors. > > Hence the reason I asked about a resource that might identify the > characters. Short answer, any byte with the high bit set. You're going to need to assign them a meaning. Additionally you're going to have to fix your code to only output correct encoded data. The suggestion to simply reload the database as if all the current data was WIN1251 or Latin-9 is a fairly easy way to getting the database into a reasonable format. The data would have to be checked though. Have a nice day, -- Martijn van Oosterhout <kleptog@xxxxxxxxx> http://svana.org/kleptog/ > Patriotism is when love of your own people comes first; nationalism, > when hate for people other than your own comes first. > - Charles de Gaulle
Attachment:
signature.asc
Description: Digital signature