On 20/03/2008, tedd <tedd.sperling@xxxxxxxxx> wrote:> At 9:29 PM +0200 3/19/08, Dotan Cohen wrote:> >I am asking the second question: how many Hebrew characters in a> >string that _very_likely_ contains other characters as well. The array> >suggestion sounds about what I am doing: checking if each letter is a> >Hebrew character.> >> >I will also look into the mb_ functions. I did not know about them> >before. Thanks.> >> >Dotan Cohen>>> Dotan:>> It really doesn't make any difference.>> If you have a single character that is not ASCII, then it's something> beyond ASCII and you'll need to use the mb_functions.>> Unicode contains all known characters (code points) including ASCII> with values equal to ASCII -- so there's no problem between code> points and ASCII.>> The beyond ASCII string problem is basically what is a character? We> all know what an "a" is, but what about "a" with a "~" above it? Is> it one character or two? If it's a combination of two code points,> then it's a grapheme.>> What about the character "fi" when it's combined? Is it one character> or two? In this case, it's a ligature and is a single code point.>> So, when you are trying to count characters in a string, using ASCII> based functions won't work because they might count one character as> two and break the character in two parts. Or, the character might be> actually two characters, but they should be counted as one. As such,> mb_functions are designed to work with these types of problems where> as standard string functions won't.>> The easy way to tell IF you should use mb_functions is if all the> characters you're working with appear in the ASCII table, then> standard string functions apply. However, if any of the characters> are not found in ASCII, then you need to go another route.>> At least, that's my understanding.>>> Cheers,>> tedd Thank you Tedd, that was very helpful. After reading your mail fromyesterday I went to wikipedia to learn what graphemes and ligaturesare. Your example of "fi" was there, otherwise I would have had noidea that those letters can be combined. In Hebrew and Arabic,especially, I can see how the vowel points (Hebrew) and combinationslike "LA" (Arabic) can confuse the ASCII function. Thanks. Dotan Cohen http://what-is-what.comhttp://gibberish.co.ilא-ב-ג-ד-ה-ו-ז-ח-ט-י-ך-כ-ל-ם-מ-ן-נ-ס-ע-ף-פ-ץ-צ-ק-ר-ש-ת; A: Because it messes up the order in which people normally read text.Q: Why is top-posting such a bad thing?