Brad Fuller wrote: > I'm looking for a regular expression to accomplish a specific task. > > I'm hoping someone who's really good at regex patterns can lend a quick hand. > > I need a regex pattern that will grab URLs out of HTML that have a > certain link text. (i.e. the word "Continue") > > This is what I have so far but it does not work properly (If there are > other attributes in the <a> tag it returns them as part of the URL.) > > preg_match_all('#<a[\s]+[^>]*href\s*=\s*([\"\']+)([^>]+?)(\1|>)>Continue</a>#i', > $html, $matches); > > It needs to be able to extract the URL and disregard arbitrary > attributes in the HTML tag > > Test it with the following examples: > > <a href=/path/to/url.html>Continue</a> > <a href='/path/to/url.html'>Continue</a> > <a href="http://example.com/path/to/url.html" class="link">Continue</a> > <a style="font-size: 12px" href="http://example.com/path/to/url.html" > onlick="someFunction('foo','bar')">Continue</a> > > Please reply > > Your help is much appreciated. > > Thanks in advance, > Brad F. > Looking at this document from an XML standpoint, I could see doing this rather easily. Without having to use regex. You might look into using DomDocument and simpleXML to complete the task. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php