Re: Re: CHAR field with charset UTF8 and COLLATION UNICODE_CI_AI or UTF8PHP is loading white spaces

Lester Caine <lester@xxxxxxxxxxx> · Tue, 22 Nov 2016 14:42:17 +0000

On 22/11/16 13:56, Delmar Wichnieski wrote:
> But VARCHAR fields work correctly. Problem only in CHAR.

VARCHAR is trimmed to the number of bytes used ... not the number of
'characters'! CHAR is only designed for single byte characters IN PHP so
providing multi byte characters to a CHAR(1) field does not know how
many actual characters are displayed. I'm not saying what is currently
happening is right, but it is 'safe' since PHP then needs help to move
the string to a variable that it can check if the UTF8 data is a single
character or multiple characters.

> And it should not be gambol, because
> "Each UTF is reversible, thus every UTF supports lossless round tripping:
> mapping from any Unicode coded character sequence S to a sequence of bytes
> and back will produce S again."
> Source
> http://unicode.org/faq/utf_bom.html

Provided that there is no processing of the data then that is correct,
but operations like 'upper' and 'lower' can result in a change in number
of characters, and the addition of accent characters can also result in
differences. It is this area that basically stopped the development of a
UTF8 native PHP6. Normalization in http://www.unicode.org/reports/tr15/
is a minefield even for the Firebird collation process ... Just how long
is the normalized string?

> 2016-11-22 11:21 GMT-02:00 Lester Caine <lester@xxxxxxxxxxx>:
> 
>> > On 22/11/16 12:58, Delmar Wichnieski wrote:
>>> > > Since there was no answer here on the list, I was feeling alone and
>> > afraid
>>> > > and wondering why no one else has this problem.
>> >
>> > Delmar I must apologise as I HAD posted a reply, but it did not actually
>> > go through ... list in bounce emails mode which I missed ...
>> >
>> > The simple answer is that strings in PHP are not UTF8 so the 'bug' you
>> > are listing is actually that we need to make sure that the single byte
>> > buffer for a string is long enough. To ensure UTF8 strings to be handled
>> > properly since PHP6 is not going to happen, we have to transfer the
>> > simple php strings to mbstring objects. UTF8 is a gambol in PHP if it is
>> > going to be transferred properly as a simple string variable and will
>> > give string length as bytes rather than characters ...

-- 
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

-- 
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php