On Thu, Aug 7, 2008 at 9:38 AM, Merlin Moncure <mmoncure@xxxxxxxxx> wrote: > On Thu, Aug 7, 2008 at 1:16 AM, Sim Zacks <sim@xxxxxxxxxxxxxx> wrote: >> >>> I don't quite follow that...the whole point of utf8 encoded database >>> is so that you can use text functions and operators without the bytea >>> treatment. As long as your client encoding is set up properly (so >>> that data coming in and out is computed to utf8), then you should be >>> ok. Dropping to ascii is usually not the solution. Your data >>> inputting application should set the client encoding properly and >>> coerce data into the unicode text type...it's really the only >>> solution. >>> >> Email does not always follow a specific character set. I have tried >> converting the data that comes in to utf-8 and it does not always work. >> We receive Hebrew emails which come in mostly 2 flavors, UTF-8 and >> windows-1255. Unfortunately, they are not compatible with one another. >> SQL-ASCII and ASCII are different as someone on the list pointed out to >> me. According to the documentation, SQL-ASCII makes no assumption about >> encoding, so you can throw in any encoding you want. > > no, you can't! SQL-ASCII means that the database treats everything > like ascii. This means that any operation that deals with text could > (and in the case of Hebrew, almost certianly will) be broken. Simple > things like getting the length of a string will be wrong. If you are > accepting unicode input, you absolutely must be using a unicode > encoded backend. er, I see the problem (single piece of text with multiple encodings inside) :-). ok, it's more complicated than I thought. still, you need to convert the email to utf8. There simply must be a way, otherwise your emails are not well defined. This is a client side problem...if you push it to the server in ascii, you can't use any server side text operations reliably. merlin merlin