RE: Convert UTF-8 to PHP defines

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



At 7:11 PM +0100 5/27/10, Ashley Sheridan wrote:
On Thu, 2010-05-27 at 14:06 -0400, Bob McConnell wrote:
 > From: Ashley Sheridan
 > > I don't use the higher range of characters often, but I do sometimes use
 > them for things like the graphical glyphs (1?2)&, etc) I know I could do
 > those with regular text and the Wingdings font, but that's not available
 > on every computer, and breaks the semantic meaning behind the glyphs.

What higher range? ASCII only defined 128 values, the bottom 32 being control characters that don't print. Anything outside of that is not ASCII, but a proprietary extension. In particular, the glyphs usually associated with 0-32 and 128-255 are IBM specific and not guaranteed to be present outside of their original video ROM. So only the first 128 characters map directly into UTF-8.

 Bob McConnell

Ref: pp 25-29 The Programmer's PC Sourcebook, 1988, Thom Hogan, Microsoft Press


The higher range of utf8 characters that don't map to ascii values.

Thanks,
Ash

Bob:

I understood what Ash was referring re his "higher range" statement, but his second statement was somewhat confusing.

ASCII is defined as characters having a value of 0-127 DEC (00-7F HEX). The "higher range" of 128-255 DEC (80-FF HEX) have been loosely characterized as "extended ASCII" but have not been officially declared such. Both M$ and Apple have their own characters appearing the range and have used different character for different things -- thus problems arose is using either. I do not know if the problem was ever resolved. It's probably best to never use such characters.

The Unicode database uses the same lower character values (i.e., "code points") as does ASCII, namely 0-127, and thus UFT-8 (8-bit variable width encoding) is really a super-set which includes the sub-set of ASCII.

The "Wingdings" font that Ash refers to is the really the "Dingbat" char set in Unicode, as shown here:

http://www.unicode.org/charts/PDF/U2700.pdf

These are real characters that can be used for all sorts of things including url's, for example:

http://xn--gci.com

Please forgive the PUNYCODE url, but IE does not recognize "other than ASCII" characters in url's, whereas Safari will show the url correctly. Clearly, Safari has the upper hand in resolving "other than English" issues -- perhaps that's why their overseas profits last year exceeded their domestic -- but I digress.

The use of UFT-8 encoding in everything (web and php) should present much less problems globally than it is trying to fight it.

Here's some references that may help:

[1] <http://webstandardsgroup.org/>
[2] <http://www.w3.org/People/Ishida/>
[3] <http://www.w3.org/International>
[4] <http://shiflett.org/archive/177>
[5] <http://en.wikipedia.org/wiki/Universal_character_set>
[6] <http://www.unicode.org/>

Cheers,

tedd

--
-------
http://sperling.com  http://ancientstones.com  http://earthstones.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux