Re: Re: CHAR field with charset UTF8 and COLLATION UNICODE_CI_AI or UTF8PHP is loading white spaces

Lester Caine <lester@xxxxxxxxxxx> · Tue, 22 Nov 2016 20:16:04 +0000

On 22/11/16 18:01, Delmar Wichnieski wrote:
> 2016-11-22 12:42 GMT-02:00 Lester Caine <lester@xxxxxxxxxxx>:
> 
>> >  needs help to move
>> > the string to a variable that it can check if the UTF8 data is a single
>> > character or multiple characters.
>> >
>> >
> I believe it is a single byte, the goal is to simulate a boolean field,
> where I only use S for yes and N for no. (Idem/the same Y and N in English).
> 
> There is no operations like 'upper' and 'lower'. The script is very simple,
> according to pastebin links in the previous message.
> 
> S and N are in the range between 0 and 127 of the ASCII table and UTF-8
> says that only one byte is required to encode the first 128 ASCII
> characters (Unicode U + 0000 to U + 007F).
> 
> But even if it consumed 2, 3 or 4 bytes, UNICODE should predict the end of
> the character, so it would be enough to find the end, apply the inverse
> algorithm to the encoding of the code point, and we would have the
> character back. This is just a dream.
> 
> 
> If the situation presented along the thread is a problem, then more people
> should report it. Let's wait. I'll use trim per hour, or cast
> 
> example
> 
> $q = $pdoconn->prepare("SELECT CODIGO, CAST(ACESSOSISTEMA AS VARCHAR(1)) AS
> ACESSOSISTEMA FROM USUARIO");
> 
> And the problem is solved. Or yet another solution not thought out.

That is perhaps the point. PHP on it's own can't decide if you need to
convert to an ASCII single byte, allow space for a multiple byte single
character or something else. All PHP sees is a buffer with a number of
bytes in, and what comes over the wire from Firebird even strips any
trailing space characters requiring the client end to untangle things.
If you want a unicode string you have to copy it to a mbstring variable
since the simple single byte buffer does not know that it is not just
256 bit data. Now a CHAR(1) could be treated as a special case, but
CHAR(2+) can not be so easily handled. This is one reason why the normal
'hack' to add a binary domain is to use a SMALLINT rather than a CHAR
and store NULL/0/1 ...

I'm not saying that the current results are correct, just that without a
native handling of unicode one has some edge cases which could be
resolved different ways. Returning a unicode CHAR field as a fixed
number of 32bit characters has an attraction when one needs to work with
particular fixed character positions in the string but while UNICODE_FSS
was designed with that in mind, UTF8 *IS* the right way forward once
everybody actually supports it ;)

One of my pet grips is that simple PHP variables do not play well with
database fields, and rather than having to pull in mbstring, extending
'string' so that it can be flagged as utf8 and handle a utf8 field
natively is what is needed. The fact that Firebird is capable of using a
different collation for each field is not something that PHP understands
and another reason I don't use PDO at all in production. With ADOdb one
has a bit more access to the metadata for the query.

-- 
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

-- 
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php