RE: Need help with RegEx

Michael <michael@xxxxxxxxxxxxxx> · Mon, 11 Dec 2006 12:43:15 -0700

At 08:29 AM 12/11/2006 , Brad Fuller wrote:
>
>The example provided didn't work for me.  It gave me the same string without
>anything modified.

You are absolutely correct, this is what I get for not testing it explicitly :( My most sincere apologies to the OP and the list, there is an error in my example (see below for correction)

**** I have cut and pasted from further down in the quoted message, for convenience ****
>> Using the tags you describe here, and assuming the source html is in the
>> variable $source_html, try this:
>> 
>> $trans_text = preg_replace("/(.*?)(<div id=result_box
>> dir=ltr>)(.*?)(<\/div>)(.*?)^/s","$3",$source_html);

The End of string symbol ^ should not be included. I tested the above function without the ^ and it worked for me. below is the TESTED version:

$trans_text = preg_replace("/(.*?)(<div id=result_box dir=ltr>)(.*?)(<\/div>)(.*?)/s","$3",$source_html);
***** end of pasted section *****

>
>I am also looking for this solution to strip out text from some XML response
>I get from posting data to a remote server.  I can do it using substring
>functions but I'd like something more compact and portable. (A one-liner
>that I could modify for other uses as well)
>
>Example 1:
><someXMLtags>
>	<status>16664 Rejected: Invalid LTV</status>
></someXMLtags>
>
>Example 2:
><someXMLtags>
>	<status>Unable to Post, Invalid Information</status>
></someXMLtags>
>
>I want what is inside the <status> tags.
>
>Does anyone have a working solution how we can get the text from inside
>these tags using regex?
>
>Much appreciated,
>
>B
>
>> -----Original Message-----
>> From: Michael [mailto:michael@xxxxxxxxxxxxxx]
>> Sent: Monday, December 11, 2006 6:59 AM
>> To: Anthony Papillion
>> Cc: php-general@xxxxxxxxxxxxx
>> Subject: Re:  Need help with RegEx
>> 
>> At 01:02 AM 12/11/2006 , Anthony Papillion wrote:
>> >Hello Everyone,
>> >
>> >I am having a bit of problems wrapping my head around regular
>> expressions. I
>> >thought I had a good grip on them but, for some reason, the expression
>> I've
>> >created below simply doesn't work! Basically, I need to retreive all of
>> the
>> >text between two unique and specific tags but I don't need the tag text.
>> So
>> >let's say that the tag is
>> >
>> ><tag lang='ttt'>THIS IS A TEST</tag>
>> >
>> >I would need to retreive THIS IS A TEST only and nothing else.
>> >
>> >Now, a bit more information: I am using cURL to retreive the entire
>> contents
>> >of a webpage into a variable. I am then trying to perform the following
>> >regular expression on the retreived text:
>> >
>> >$trans_text = preg_match("\/<div id=result_box dir=ltr>(.+?)<\/div>/");
>> 
>> Using the tags you describe here, and assuming the source html is in the
>> variable $source_html, try this:
>> 
>> $trans_text = preg_replace("/(.*?)(<div id=result_box
>> dir=ltr>)(.*?)(<\/div>)(.*?)^/s","$3",$source_html);

The End of string symbol ^ should not be included. I tested the above function without the ^ and it worked for me. below is the TESTED version:

$trans_text = preg_replace("/(.*?)(<div id=result_box dir=ltr>)(.*?)(<\/div>)(.*?)/s","$3",$source_html);

>> 
>> how this breaks down is:
>> 
>> opening quote for first parameter (your MATCH pattern).
>> 
>> open regex match pattern= /
>> 
>> first atom (.*?) = any or no leading text before <div id=result_box
>> dir=ltr>,
>> the ? makes it non-greedy so that it stops after finding the first match.
>> 
>> second atom (<div id=result_box dir=ltr>) = the opening tag you are
>> looking for.
>> 
>> third atom (.*?) = the text you want to strip out, all text even if
>> nothing is
>> there, between the 2nd and
>> 4th atoms.
>> 
>> fourth atom (<\/div>) = the closing tag of the div tag pair.
>> 
>> fifth atom (.*?) = all of the rest of the source html after the closing
>> tag up
>> to the end of the line ^,even if there is nothing there.
>> 
>> close regex match pattern= /s
>> 
>> in order for this to work on html that may contain newlines, you must
>> specify
>> that the . can represent newline characters, this is done by adding the
>> letter
>> 's' after your regex closing /, so the last thing in your regex match
>> pattern
>> would be /s.
>> 
>> end of string ^ (this matches the end of the string you are
>> matching/replacing
>> , $source_html)

 ignore this part of the explanation, the ^ is not needed and in fact breaks the example given

>> 
>> closing quote for first parameter.
>> 
>> The second parameter of the preg_replace is the atom # which contains the
>> text
>> you want to replace the text matched by the regex match pattern in the
>> first
>> parameter, in this case the text we want is in the third atom so this
>> parameter
>> would be $3 (this is the PHP way of back-referencing, if we wanted the
>> text
>> before the tag we would use atom 1, or $1, if we want the tag itself we
>> use $2,
>> etc basically a $ followed by the atom # that holds what we want to
>> replace the
>> $source_html into $trans_text).
>> 
>> The third parameter of the preg_replace is the source you wish to match
>> and
>> replace from, in this case your source html in $source_html.
>> 
>> after this executes, $trans_text should contain the innerText of the <div
>> id=result_box dir=ltr></div> tag pair from $source_html, if there is
>> nothing
>> between the opening and closing tags, $trans_text will == "", if there is
>> only
>> a newline between the tags, $trans_text will == "\n". IMPORTANT: if the
>> text
>> between the tags contains a newline, $trans_text will also contain that
>> newline
>> character because we told . to match newlines.
>> 
>> I am no regex expert by far, but this worked for me (assuming I copied it
>> correctly here heh)
>> There are doubtless many other ways to do this, and I am sure others on
>> the
>> list here will correct me if my way is wrong or inefficient.
>> 
>> I hope this works for you and that I haven't horribly embarassed myself
>> here.
>> Good luck :)
>> 
>> >
>> >The problem is that when I echo the value of $trans_text variable, I end
>> up
>> >with the entire HTML of the page.
>> >
>> >Can anyone clue me in to what I am doing wrong?
>> >
>> >Thanks,
>> >Anthony
>> >
>> >--
>> >PHP General Mailing List (http://www.php.net/)
>> >To unsubscribe, visit: http://www.php.net/unsub.php
>> >
>
>-- 
>PHP General Mailing List (http://www.php.net/)
>To unsubscribe, visit: http://www.php.net/unsub.php
> 

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php