Dave M G wrote: > Jochem, > > Thank you for responding, and for explaining more about regular > expressions. > >> yes but you wouldn't use preg_replace() but rather preg_match() or >> preg_match_all() >> which gives you back an array (via 3rd/4th[?] reference argument) >> which contains >> the texts that matched (and therefore want to keep). > I looked up preg_match_all() on php.net, and, in combination with what > was said before, came up with this syntax: > > preg_match_all( "#^<li[^>]*>(.*)<br[^>]*>#is", $response, $wordList, ^--- remove the caret as you dont want to only match when the line starts with <li> (the <li> can be anywhere on the line) I'll assume you also have the mb extension setup. > PREG_PATTERN_ORDER ); > var_dump($wordList); > > The idea is to catch all text between <li> and <br> tags. > > Unfortunately, the result I get from var_dump is: > > array(2) { [0]=> array(0) { } [1]=> array(0) { } } > > In other words, it made no matches. > > The text being searched is an entire web page which contains the following: > (Please note the following includes utf-8 encoded Japanese text. > Apologies if it comes out as ASCII gibberish) > > <FONT color="red">日本語</FONT>は<FONT color="red">簡単</FONT>だよ<br> > <ul><li> 日本語 【にほんご】 (n) Japanese language; (P); EP <br> > <li> 簡単 【かんたん】 (adj-na,n) simple; (P); EP <br> > </ul><p> > > So, my preg_match_all search should have found: > > 日本語 【にほんご】 (n) Japanese language; (P); EP > 簡単 【かんたん】 (adj-na,n) simple; (P); EP > > I've checked and rechecked my syntax, and I can't see why it would fail. > > Have I messed up the regular expression, or the use of preg_match_all? > > -- > Dave M G > -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php