Chian Hsieh wrote:
Hi, I want to extract all contents started with <embed> and <object> with/without closing tags. My solution is using a regular expression to get it work, but there is some exception I could not handle out. The REGEXs I used are: // With closing tag if (preg_match_all("#(<(object|embed)[^>]+>.*?</\\2>)#is", $str, $matchObjs)) { // blahblah // Without closing tag } else if (preg_match_all("#(<(?:object|embed)[^>]+>)#",$str,$matchObjs)){ // blahblah } But it might be failed if the $str are mixed with/without closing tags: $str ='<div><div><object type="application/x-shockwave-flash"><param name="zz" value="xx"></object></div><div><embed src="http://sample.com" /></div>' In this situation, it will only get the <object type="application/x-shockwave-flash"><param name="zz" value="xx"></object> but I want to get the two results which are <object type="application/x-shockwave-flash"><param name="zz" value="xx"></object> <embed src="http://sample.com" /> So, is there a good way to use one REGEX to process this issue?
If you're open to using methods other than regex; then one way to get pretty good results is to run the document through HTML Tidy, then parse it in to a DOM and query it using xpath/xquery - basically mimic the base way in which the browsers do it (and the way recommended by the HTML specs)
Best, Nathan -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php