Re: Need help with RegEx

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



At 01:02 AM 12/11/2006 , Anthony Papillion wrote:
>Hello Everyone,
>
>I am having a bit of problems wrapping my head around regular expressions. I 
>thought I had a good grip on them but, for some reason, the expression I've 
>created below simply doesn't work! Basically, I need to retreive all of the 
>text between two unique and specific tags but I don't need the tag text. So 
>let's say that the tag is
>
><tag lang='ttt'>THIS IS A TEST</tag>
>
>I would need to retreive THIS IS A TEST only and nothing else.
>
>Now, a bit more information: I am using cURL to retreive the entire contents 
>of a webpage into a variable. I am then trying to perform the following 
>regular expression on the retreived text:
>
>$trans_text = preg_match("\/<div id=result_box dir=ltr>(.+?)<\/div>/");

Using the tags you describe here, and assuming the source html is in the
variable $source_html, try this:

$trans_text = preg_replace("/(.*?)(<div id=result_box
dir=ltr>)(.*?)(<\/div>)(.*?)^/s","$3",$source_html);

how this breaks down is:
 
opening quote for first parameter (your MATCH pattern).

open regex match pattern= /

first atom (.*?) = any or no leading text before <div id=result_box dir=ltr>,
the ? makes it non-greedy so that it stops after finding the first match.

second atom (<div id=result_box dir=ltr>) = the opening tag you are looking for.

third atom (.*?) = the text you want to strip out, all text even if nothing is
there, between the 2nd and
4th atoms.

fourth atom (<\/div>) = the closing tag of the div tag pair.

fifth atom (.*?) = all of the rest of the source html after the closing tag up
to the end of the line ^,even if there is nothing there.

close regex match pattern= /s

in order for this to work on html that may contain newlines, you must specify
that the . can represent newline characters, this is done by adding the letter
's' after your regex closing /, so the last thing in your regex match pattern
would be /s.

end of string ^ (this matches the end of the string you are matching/replacing
, $source_html)

closing quote for first parameter.

The second parameter of the preg_replace is the atom # which contains the text
you want to replace the text matched by the regex match pattern in the first
parameter, in this case the text we want is in the third atom so this parameter
would be $3 (this is the PHP way of back-referencing, if we wanted the text
before the tag we would use atom 1, or $1, if we want the tag itself we use $2,
etc basically a $ followed by the atom # that holds what we want to replace the
$source_html into $trans_text).

The third parameter of the preg_replace is the source you wish to match and
replace from, in this case your source html in $source_html.

after this executes, $trans_text should contain the innerText of the <div
id=result_box dir=ltr></div> tag pair from $source_html, if there is nothing
between the opening and closing tags, $trans_text will == "", if there is only
a newline between the tags, $trans_text will == "\n". IMPORTANT: if the text
between the tags contains a newline, $trans_text will also contain that newline
character because we told . to match newlines.

I am no regex expert by far, but this worked for me (assuming I copied it
correctly here heh)
There are doubtless many other ways to do this, and I am sure others on the
list here will correct me if my way is wrong or inefficient.

I hope this works for you and that I haven't horribly embarassed myself here.
Good luck :)

>
>The problem is that when I echo the value of $trans_text variable, I end up 
>with the entire HTML of the page.
>
>Can anyone clue me in to what I am doing wrong?
>
>Thanks,
>Anthony 
>
>-- 
>PHP General Mailing List (http://www.php.net/)
>To unsubscribe, visit: http://www.php.net/unsub.php
>  

[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux