PHP list,
While I'm only just learning about regular expressions in another
thread, I still seem to be finding exceptional situations which have me
questioning the extent to which preg expressions can be implemented.
(The following contains UTF-8 encoded Japanese text. Apologies if it
comes out as ASCII gibberish.)
What I have are sentences that look like this:
気温 【きおん】 (n) atmospheric temperature; (P); EP
について (exp) concerning; along; under; per; KD
I want to divide the first line into three variables, $word, $reading,
and $meaning. And I want to divide the second line into two variables,
$word and $meaning.
If I can figure out how to extract the first variable, $word, then I can
figure out the rest. But that first step seems to be a doozy.
The way I see it, I could do it two ways. One is to take out all the
pull out all the characters up to the first occurrence of a space, and
assume that it's Japanese. Not that I'm sure how to write that
expression, but maybe I could.
But it seems like it would be a lot more sophisticated if I could
determine if a word was Japanese by testing it's Unicode value or some
similar method. That way I would be less vulnerable to slight
variabilities in positioning of words in the source material.
Looking at all the multibyte related functions in the PHP manual, it
seems there are options for testing the type of encoding, but not for
the type of language or character set.
http://jp2.php.net/manual/en/ref.mbstring.php
However, I could be wrong about this (and it would be nice if I was).
Searching the web, I came across this guy's script to test if characters
were above the usual ASCII range in Unicode, and could therefore be
assumed to be Japanese:
http://www.randomchaos.com/documents/?source=php_and_unicode
But this seems unwieldy, as I think, if I understand it correctly, I'd
have to test each individual word. I could use it to test if there was
any Japanese at all in a string, but I'm not confident I could use it to
extract words.
So I'm a little stuck. If anyone has any advice to help get me started,
it would be much appreciated.
Thank you for your time and help.
--
Dave M G
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php