Re: Checking how many letters are in a string.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 20/03/2008, tedd <tedd.sperling@xxxxxxxxx> wrote:> At 9:29 PM +0200 3/19/08, Dotan Cohen wrote:>  >I am asking the second question: how many Hebrew characters in a>  >string that _very_likely_ contains other characters as well. The array>  >suggestion sounds about what I am doing: checking if each letter is a>  >Hebrew character.>  >>  >I will also look into the mb_ functions. I did not know about them>  >before. Thanks.>  >>  >Dotan Cohen>>> Dotan:>>  It really doesn't make any difference.>>  If you have a single character that is not ASCII, then it's something>  beyond ASCII and you'll need to use the mb_functions.>>  Unicode contains all known characters (code points) including ASCII>  with values equal to ASCII -- so there's no problem between code>  points and ASCII.>>  The beyond ASCII string problem is basically what is a character? We>  all know what an "a" is, but what about "a" with a "~" above it? Is>  it one character or two? If it's a combination of two code points,>  then it's a grapheme.>>  What about the character "fi" when it's combined? Is it one character>  or two? In this case, it's a ligature and is a single code point.>>  So, when you are trying to count characters in a string, using ASCII>  based functions won't work because they might count one character as>  two and break the character in two parts. Or, the character might be>  actually two characters, but they should be counted as one. As such,>  mb_functions are designed to work with these types of problems where>  as standard string functions won't.>>  The easy way to tell IF you should use mb_functions is if all the>  characters you're working with appear in the ASCII table, then>  standard string functions apply. However, if any of the characters>  are not found in ASCII, then you need to go another route.>>  At least, that's my understanding.>>>  Cheers,>>  tedd
Thank you Tedd, that was very helpful. After reading your mail fromyesterday I went to wikipedia to learn what graphemes and ligaturesare. Your example of "fi" was there, otherwise I would have had noidea that those letters can be combined. In Hebrew and Arabic,especially, I can see how the vowel points (Hebrew) and combinationslike "LA" (Arabic) can confuse the ASCII function. Thanks.
Dotan Cohen
http://what-is-what.comhttp://gibberish.co.ilא-ב-ג-ד-ה-ו-ז-ח-ט-י-ך-כ-ל-ם-מ-ן-נ-ס-ע-ף-פ-ץ-צ-ק-ר-ש-ת;
A: Because it messes up the order in which people normally read text.Q: Why is top-posting such a bad thing?

[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux