At 9:29 PM +0200 3/19/08, Dotan Cohen wrote:
I am asking the second question: how many Hebrew characters in a
string that _very_likely_ contains other characters as well. The array
suggestion sounds about what I am doing: checking if each letter is a
Hebrew character.
I will also look into the mb_ functions. I did not know about them
before. Thanks.
Dotan Cohen
Dotan:
It really doesn't make any difference.
If you have a single character that is not ASCII, then it's something
beyond ASCII and you'll need to use the mb_functions.
Unicode contains all known characters (code points) including ASCII
with values equal to ASCII -- so there's no problem between code
points and ASCII.
The beyond ASCII string problem is basically what is a character? We
all know what an "a" is, but what about "a" with a "~" above it? Is
it one character or two? If it's a combination of two code points,
then it's a grapheme.
What about the character "fi" when it's combined? Is it one character
or two? In this case, it's a ligature and is a single code point.
So, when you are trying to count characters in a string, using ASCII
based functions won't work because they might count one character as
two and break the character in two parts. Or, the character might be
actually two characters, but they should be counted as one. As such,
mb_functions are designed to work with these types of problems where
as standard string functions won't.
The easy way to tell IF you should use mb_functions is if all the
characters you're working with appear in the ASCII table, then
standard string functions apply. However, if any of the characters
are not found in ASCII, then you need to go another route.
At least, that's my understanding.
Cheers,
tedd
--
-------
http://sperling.com http://ancientstones.com http://earthstones.com
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php