RE: Need help with RegEx

"Brad Fuller" <bfuller@xxxxxxxxxxxxxxxx> · Mon, 11 Dec 2006 10:29:48 -0500

The example provided didn't work for me.  It gave me the same string without
anything modified.

I am also looking for this solution to strip out text from some XML response
I get from posting data to a remote server.  I can do it using substring
functions but I'd like something more compact and portable. (A one-liner
that I could modify for other uses as well)

Example 1:
<someXMLtags>
	<status>16664 Rejected: Invalid LTV</status>
</someXMLtags>

Example 2:
<someXMLtags>
	<status>Unable to Post, Invalid Information</status>
</someXMLtags>

I want what is inside the <status> tags.

Does anyone have a working solution how we can get the text from inside
these tags using regex?

Much appreciated,

B

> -----Original Message-----
> From: Michael [mailto:michael@xxxxxxxxxxxxxx]
> Sent: Monday, December 11, 2006 6:59 AM
> To: Anthony Papillion
> Cc: php-general@xxxxxxxxxxxxx
> Subject: Re:  Need help with RegEx
> 
> At 01:02 AM 12/11/2006 , Anthony Papillion wrote:
> >Hello Everyone,
> >
> >I am having a bit of problems wrapping my head around regular
> expressions. I
> >thought I had a good grip on them but, for some reason, the expression
> I've
> >created below simply doesn't work! Basically, I need to retreive all of
> the
> >text between two unique and specific tags but I don't need the tag text.
> So
> >let's say that the tag is
> >
> ><tag lang='ttt'>THIS IS A TEST</tag>
> >
> >I would need to retreive THIS IS A TEST only and nothing else.
> >
> >Now, a bit more information: I am using cURL to retreive the entire
> contents
> >of a webpage into a variable. I am then trying to perform the following
> >regular expression on the retreived text:
> >
> >$trans_text = preg_match("\/<div id=result_box dir=ltr>(.+?)<\/div>/");
> 
> Using the tags you describe here, and assuming the source html is in the
> variable $source_html, try this:
> 
> $trans_text = preg_replace("/(.*?)(<div id=result_box
> dir=ltr>)(.*?)(<\/div>)(.*?)^/s","$3",$source_html);
> 
> how this breaks down is:
> 
> opening quote for first parameter (your MATCH pattern).
> 
> open regex match pattern= /
> 
> first atom (.*?) = any or no leading text before <div id=result_box
> dir=ltr>,
> the ? makes it non-greedy so that it stops after finding the first match.
> 
> second atom (<div id=result_box dir=ltr>) = the opening tag you are
> looking for.
> 
> third atom (.*?) = the text you want to strip out, all text even if
> nothing is
> there, between the 2nd and
> 4th atoms.
> 
> fourth atom (<\/div>) = the closing tag of the div tag pair.
> 
> fifth atom (.*?) = all of the rest of the source html after the closing
> tag up
> to the end of the line ^,even if there is nothing there.
> 
> close regex match pattern= /s
> 
> in order for this to work on html that may contain newlines, you must
> specify
> that the . can represent newline characters, this is done by adding the
> letter
> 's' after your regex closing /, so the last thing in your regex match
> pattern
> would be /s.
> 
> end of string ^ (this matches the end of the string you are
> matching/replacing
> , $source_html)
> 
> closing quote for first parameter.
> 
> The second parameter of the preg_replace is the atom # which contains the
> text
> you want to replace the text matched by the regex match pattern in the
> first
> parameter, in this case the text we want is in the third atom so this
> parameter
> would be $3 (this is the PHP way of back-referencing, if we wanted the
> text
> before the tag we would use atom 1, or $1, if we want the tag itself we
> use $2,
> etc basically a $ followed by the atom # that holds what we want to
> replace the
> $source_html into $trans_text).
> 
> The third parameter of the preg_replace is the source you wish to match
> and
> replace from, in this case your source html in $source_html.
> 
> after this executes, $trans_text should contain the innerText of the <div
> id=result_box dir=ltr></div> tag pair from $source_html, if there is
> nothing
> between the opening and closing tags, $trans_text will == "", if there is
> only
> a newline between the tags, $trans_text will == "\n". IMPORTANT: if the
> text
> between the tags contains a newline, $trans_text will also contain that
> newline
> character because we told . to match newlines.
> 
> I am no regex expert by far, but this worked for me (assuming I copied it
> correctly here heh)
> There are doubtless many other ways to do this, and I am sure others on
> the
> list here will correct me if my way is wrong or inefficient.
> 
> I hope this works for you and that I haven't horribly embarassed myself
> here.
> Good luck :)
> 
> >
> >The problem is that when I echo the value of $trans_text variable, I end
> up
> >with the entire HTML of the page.
> >
> >Can anyone clue me in to what I am doing wrong?
> >
> >Thanks,
> >Anthony
> >
> >--
> >PHP General Mailing List (http://www.php.net/)
> >To unsubscribe, visit: http://www.php.net/unsub.php
> >

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php