On Fri, Oct 23, 2009 at 1:54 PM, Israel Ekpo <israelekpo@xxxxxxxxx> wrote: > > > On Fri, Oct 23, 2009 at 1:48 PM, Ashley Sheridan <ash@xxxxxxxxxxxxxxxxxxxx> > wrote: >> >> On Fri, 2009-10-23 at 13:45 -0400, Brad Fuller wrote: >> >> > On Fri, Oct 23, 2009 at 1:28 PM, Ashley Sheridan >> > <ash@xxxxxxxxxxxxxxxxxxxx>wrote: >> > >> > > On Fri, 2009-10-23 at 13:23 -0400, Brad Fuller wrote: >> > > >> > > I'm looking for a regular expression to accomplish a specific task. >> > > >> > > I'm hoping someone who's really good at regex patterns can lend a >> > > quick hand. >> > > >> > > I need a regex pattern that will grab URLs out of HTML that have a >> > > certain link text. (i.e. the word "Continue") >> > > >> > > This is what I have so far but it does not work properly (If there are >> > > other attributes in the <a> tag it returns them as part of the URL.) >> > > >> > > >> > > preg_match_all('#<a[\s]+[^>]*href\s*=\s*([\"\']+)([^>]+?)(\1|>)>Continue</a>#i', >> > > $html, $matches); >> > > >> > > It needs to be able to extract the URL and disregard arbitrary >> > > attributes in the HTML tag >> > > >> > > Test it with the following examples: >> > > >> > > <a href=/path/to/url.html>Continue</a> >> > > <a href='/path/to/url.html'>Continue</a> >> > > <a href="http://example.com/path/to/url.html" >> > > class="link">Continue</a> >> > > <a style="font-size: 12px" href="http://example.com/path/to/url.html" >> > > onlick="someFunction('foo','bar')">Continue</a> >> > > >> > > Please reply >> > > >> > > Your help is much appreciated. >> > > >> > > Thanks in advance, >> > > Brad F. >> > > >> > > >> > > >> > > >> > > preg_match_all('#<a[\s]+[^>]*href\s*=\s*[\"\']+([^\"\']+?).+?>Continue</a>#i', >> > > $html, $matches); >> > > >> > > I just changed your regex a bit. What your regex was previously doing >> > > was >> > > matching everything from the first quote after the href= right up >> > > until the >> > > first > it found, which would usually be the one that closes the >> > > opening >> > > tag. You could make it a bit more intelligent if you wished with >> > > backreferencing to make sure it matches against the same type of >> > > quotation >> > > character it matched as the start of the href's value. >> > > >> > > Thanks, >> > > Ash >> > > http://www.ashleysheridan.co.uk >> > > >> > > >> > > >> > >> > I appreciate the help. However, when try this I only get the first >> > character of the URL. Can you double check it please. >> > >> > Thanks again >> >> >> I think it's probably the first ? in ([^\"\']+?) >> >> Remove that and it should do the trick >> >> Thanks, >> Ash >> http://www.ashleysheridan.co.uk >> >> > > Hi Brad, > > I agree with Jim. > > Take a look at this. It might help. > > <?php > > $xml_string = <<<TEXT_BOUNDARY > <html> > <head> > <title></title> > </head> > <body> > <div> > <a href="http://example.com/path/to/urlA.html">Continue</a> > <a href="http://example.com/path/to/url2.html">Brad Fuller</a> > <a href="http://example.com/path/to/urlB.html">Continue</a> > <a href="http://example.com/path/to/url4.html">PHP.net</a> > <a href="http://example.com/path/to/urlC.html" > class="link">Continue</a> > <a style="font-size: 12px" > href="http://example.com/path/to/urlD.html" > onclick="someFunction('foo','bar')">Continue</a> > </div> > </body> > </html> > TEXT_BOUNDARY; > > $xml = simplexml_load_string($xml_string); > > $continue_hrefs = $xml->xpath("//a[text() = 'Continue']/@href"); > > print_r($continue_hrefs); > > ?> > Thanks, I'm sure I will use this at some point in the future :) -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php