At 7:11 PM +0100 5/27/10, Ashley Sheridan wrote:
On Thu, 2010-05-27 at 14:06 -0400, Bob McConnell wrote:
> From: Ashley Sheridan
> > I don't use the higher range of characters often, but I do sometimes use
> them for things like the graphical glyphs (1?2)&, etc) I know I could do
> those with regular text and the Wingdings font, but that's not available
> on every computer, and breaks the semantic meaning behind the glyphs.
What higher range? ASCII only defined 128
values, the bottom 32 being control characters
that don't print. Anything outside of that is
not ASCII, but a proprietary extension. In
particular, the glyphs usually associated with
0-32 and 128-255 are IBM specific and not
guaranteed to be present outside of their
original video ROM. So only the first 128
characters map directly into UTF-8.
Bob McConnell
Ref: pp 25-29 The Programmer's PC Sourcebook,
1988, Thom Hogan, Microsoft Press
The higher range of utf8 characters that don't map to ascii values.
Thanks,
Ash
Bob:
I understood what Ash was referring re his
"higher range" statement, but his second
statement was somewhat confusing.
ASCII is defined as characters having a value of
0-127 DEC (00-7F HEX). The "higher range" of
128-255 DEC (80-FF HEX) have been loosely
characterized as "extended ASCII" but have not
been officially declared such. Both M$ and Apple
have their own characters appearing the range and
have used different character for different
things -- thus problems arose is using either. I
do not know if the problem was ever resolved.
It's probably best to never use such characters.
The Unicode database uses the same lower
character values (i.e., "code points") as does
ASCII, namely 0-127, and thus UFT-8 (8-bit
variable width encoding) is really a super-set
which includes the sub-set of ASCII.
The "Wingdings" font that Ash refers to is the
really the "Dingbat" char set in Unicode, as
shown here:
http://www.unicode.org/charts/PDF/U2700.pdf
These are real characters that can be used for
all sorts of things including url's, for example:
http://xn--gci.com
Please forgive the PUNYCODE url, but IE does not
recognize "other than ASCII" characters in url's,
whereas Safari will show the url correctly.
Clearly, Safari has the upper hand in resolving
"other than English" issues -- perhaps that's why
their overseas profits last year exceeded their
domestic -- but I digress.
The use of UFT-8 encoding in everything (web and
php) should present much less problems globally
than it is trying to fight it.
Here's some references that may help:
[1] <http://webstandardsgroup.org/>
[2] <http://www.w3.org/People/Ishida/>
[3] <http://www.w3.org/International>
[4] <http://shiflett.org/archive/177>
[5] <http://en.wikipedia.org/wiki/Universal_character_set>
[6] <http://www.unicode.org/>
Cheers,
tedd
--
-------
http://sperling.com http://ancientstones.com http://earthstones.com
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php