Re: bytea encode performance issues

"Merlin Moncure" <mmoncure@xxxxxxxxx> · Thu, 7 Aug 2008 09:41:27 -0400

On Thu, Aug 7, 2008 at 9:38 AM, Merlin Moncure <mmoncure@xxxxxxxxx> wrote:
> On Thu, Aug 7, 2008 at 1:16 AM, Sim Zacks <sim@xxxxxxxxxxxxxx> wrote:
>>
>>> I don't quite follow that...the whole point of utf8 encoded database
>>> is so that you can use text functions and operators without the bytea
>>> treatment.  As long as your client encoding is set up properly (so
>>> that data coming in and out is computed to utf8), then you should be
>>> ok.  Dropping to ascii is usually not the solution.  Your data
>>> inputting application should set the client encoding properly and
>>> coerce data into the unicode text type...it's really the only
>>> solution.
>>>
>> Email does not always follow a specific character set. I have tried
>> converting the data that comes in to utf-8 and it does not always work.
>> We receive Hebrew emails which come in mostly 2 flavors, UTF-8 and
>> windows-1255. Unfortunately, they are not compatible with one another.
>> SQL-ASCII and ASCII are different as someone on the list pointed out to
>> me. According to the documentation, SQL-ASCII makes no assumption about
>> encoding, so you can throw in any encoding you want.
>
> no, you can't! SQL-ASCII means that the database treats everything
> like ascii.  This means that any operation that deals with text could
> (and in the case of Hebrew, almost certianly will) be broken.  Simple
> things like getting the length of a string will be wrong.  If you are
> accepting unicode input, you absolutely must be using a unicode
> encoded backend.

er, I see the problem (single piece of text with multiple encodings
inside) :-).  ok, it's more complicated than I thought.  still, you
need to convert the email to utf8.  There simply must be a way,
otherwise your emails are not well defined.  This is a client side
problem...if you push it to the server in ascii, you can't use any
server side text operations reliably.

merlin

merlin