At 08:29 AM 12/11/2006 , Brad Fuller wrote: > >The example provided didn't work for me. It gave me the same string without >anything modified. You are absolutely correct, this is what I get for not testing it explicitly :( My most sincere apologies to the OP and the list, there is an error in my example (see below for correction) **** I have cut and pasted from further down in the quoted message, for convenience **** >> Using the tags you describe here, and assuming the source html is in the >> variable $source_html, try this: >> >> $trans_text = preg_replace("/(.*?)(<div id=result_box >> dir=ltr>)(.*?)(<\/div>)(.*?)^/s","$3",$source_html); The End of string symbol ^ should not be included. I tested the above function without the ^ and it worked for me. below is the TESTED version: $trans_text = preg_replace("/(.*?)(<div id=result_box dir=ltr>)(.*?)(<\/div>)(.*?)/s","$3",$source_html); ***** end of pasted section ***** > >I am also looking for this solution to strip out text from some XML response >I get from posting data to a remote server. I can do it using substring >functions but I'd like something more compact and portable. (A one-liner >that I could modify for other uses as well) > >Example 1: ><someXMLtags> > <status>16664 Rejected: Invalid LTV</status> ></someXMLtags> > >Example 2: ><someXMLtags> > <status>Unable to Post, Invalid Information</status> ></someXMLtags> > >I want what is inside the <status> tags. > >Does anyone have a working solution how we can get the text from inside >these tags using regex? > >Much appreciated, > >B > >> -----Original Message----- >> From: Michael [mailto:michael@xxxxxxxxxxxxxx] >> Sent: Monday, December 11, 2006 6:59 AM >> To: Anthony Papillion >> Cc: php-general@xxxxxxxxxxxxx >> Subject: Re: Need help with RegEx >> >> At 01:02 AM 12/11/2006 , Anthony Papillion wrote: >> >Hello Everyone, >> > >> >I am having a bit of problems wrapping my head around regular >> expressions. I >> >thought I had a good grip on them but, for some reason, the expression >> I've >> >created below simply doesn't work! Basically, I need to retreive all of >> the >> >text between two unique and specific tags but I don't need the tag text. >> So >> >let's say that the tag is >> > >> ><tag lang='ttt'>THIS IS A TEST</tag> >> > >> >I would need to retreive THIS IS A TEST only and nothing else. >> > >> >Now, a bit more information: I am using cURL to retreive the entire >> contents >> >of a webpage into a variable. I am then trying to perform the following >> >regular expression on the retreived text: >> > >> >$trans_text = preg_match("\/<div id=result_box dir=ltr>(.+?)<\/div>/"); >> >> Using the tags you describe here, and assuming the source html is in the >> variable $source_html, try this: >> >> $trans_text = preg_replace("/(.*?)(<div id=result_box >> dir=ltr>)(.*?)(<\/div>)(.*?)^/s","$3",$source_html); The End of string symbol ^ should not be included. I tested the above function without the ^ and it worked for me. below is the TESTED version: $trans_text = preg_replace("/(.*?)(<div id=result_box dir=ltr>)(.*?)(<\/div>)(.*?)/s","$3",$source_html); >> >> how this breaks down is: >> >> opening quote for first parameter (your MATCH pattern). >> >> open regex match pattern= / >> >> first atom (.*?) = any or no leading text before <div id=result_box >> dir=ltr>, >> the ? makes it non-greedy so that it stops after finding the first match. >> >> second atom (<div id=result_box dir=ltr>) = the opening tag you are >> looking for. >> >> third atom (.*?) = the text you want to strip out, all text even if >> nothing is >> there, between the 2nd and >> 4th atoms. >> >> fourth atom (<\/div>) = the closing tag of the div tag pair. >> >> fifth atom (.*?) = all of the rest of the source html after the closing >> tag up >> to the end of the line ^,even if there is nothing there. >> >> close regex match pattern= /s >> >> in order for this to work on html that may contain newlines, you must >> specify >> that the . can represent newline characters, this is done by adding the >> letter >> 's' after your regex closing /, so the last thing in your regex match >> pattern >> would be /s. >> >> end of string ^ (this matches the end of the string you are >> matching/replacing >> , $source_html) ignore this part of the explanation, the ^ is not needed and in fact breaks the example given >> >> closing quote for first parameter. >> >> The second parameter of the preg_replace is the atom # which contains the >> text >> you want to replace the text matched by the regex match pattern in the >> first >> parameter, in this case the text we want is in the third atom so this >> parameter >> would be $3 (this is the PHP way of back-referencing, if we wanted the >> text >> before the tag we would use atom 1, or $1, if we want the tag itself we >> use $2, >> etc basically a $ followed by the atom # that holds what we want to >> replace the >> $source_html into $trans_text). >> >> The third parameter of the preg_replace is the source you wish to match >> and >> replace from, in this case your source html in $source_html. >> >> after this executes, $trans_text should contain the innerText of the <div >> id=result_box dir=ltr></div> tag pair from $source_html, if there is >> nothing >> between the opening and closing tags, $trans_text will == "", if there is >> only >> a newline between the tags, $trans_text will == "\n". IMPORTANT: if the >> text >> between the tags contains a newline, $trans_text will also contain that >> newline >> character because we told . to match newlines. >> >> I am no regex expert by far, but this worked for me (assuming I copied it >> correctly here heh) >> There are doubtless many other ways to do this, and I am sure others on >> the >> list here will correct me if my way is wrong or inefficient. >> >> I hope this works for you and that I haven't horribly embarassed myself >> here. >> Good luck :) >> >> > >> >The problem is that when I echo the value of $trans_text variable, I end >> up >> >with the entire HTML of the page. >> > >> >Can anyone clue me in to what I am doing wrong? >> > >> >Thanks, >> >Anthony >> > >> >-- >> >PHP General Mailing List (http://www.php.net/) >> >To unsubscribe, visit: http://www.php.net/unsub.php >> > > >-- >PHP General Mailing List (http://www.php.net/) >To unsubscribe, visit: http://www.php.net/unsub.php > -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php