Re: Re: 0x9f54

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Man-wai Chang wrote:
>>     On the other hand, I remember you talked about the type of that
>> column to be char(2).  Have you specified what encoding it's using?
>> Moreover, I hope you're not using legacy encoding like Big5 or GB.  Use
>> Unicode (UTF-8) if your database is a brand new one.
>>     
>
> Unfortunately, I am still using Big5. you need a longer field to store
> utf-8 codes for the same big5 string right?
>   
    Yes.  While in Big5 every (Chinese) character is represented by two
bytes, every Chinese character represented in UTF-8 uses at least three
bytes (in rare occasion, 4 bytes, if very rare characters are used such
as those in ancient Chinese).  This is because UTF-8 is designed to be
8-bit compatible to old data-processing functions.  In other words, for
a string containing pure Chinese characters, a UTF-8 one is 150% longer
than a Big-5 one.

    You could, of course, use UTF-16 as the base format for your
string.  In this case, every character is represented by 2 bytes, be it
a Western Latin character or an Eastern CJK character.  OK, yes, for
rare characters, you would use up to 4 bytes, but this is rare.

    Anyway, you should look at the positive side of using Unicode
instead of the dinosaur encoding, sorry, I mean Big5 :p  Hard drives
(and RAM) nowadays are getting real big, string size should be
considered as a first criterion to choose what encoding to use.

    Unicode is done by an international consortium and it could support
most languages in the world.  For instance, using Big5, you can't even
represent the simplest of Western European characters like in these
words: español or français!!  But you could represent them using
Unicode.  Actually, the ability to represent (Western) European
characters might not interest you.  But using Unicode, you could store
both traditional and simplified Chinese!  And this, I'm sure you're
interested.  You can't do that in Big5, I'm 100% sure!

    Still not convinced yet.  Well, Unicode even contains traditional
Chinese characters that Big5 doesn't support.  For example, a friend on
mine has this character 驊 in his first name.  This character isn't
supported in Big5 and in pre-Unicode period, he had to type (馬華)! 
Very stupid!  Another example: 氹 is quite a common word in southern
China but this character can't be found in Big5.

    So, think about using Unicode.  We are in 2007 and be a modern man!



----------
* Zoner PhotoStudio 8 - Your Photos perfect, shared, organised! www.zoner.com/zps
  You can download your free version.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux