On Fri, 2009-10-23 at 13:23 -0400, Brad Fuller wrote: > I'm looking for a regular expression to accomplish a specific task. > > I'm hoping someone who's really good at regex patterns can lend a quick hand. > > I need a regex pattern that will grab URLs out of HTML that have a > certain link text. (i.e. the word "Continue") > > This is what I have so far but it does not work properly (If there are > other attributes in the <a> tag it returns them as part of the URL.) > > preg_match_all('#<a[\s]+[^>]*href\s*=\s*([\"\']+)([^>]+?)(\1|>)>Continue</a>#i', > $html, $matches); > > It needs to be able to extract the URL and disregard arbitrary > attributes in the HTML tag > > Test it with the following examples: > > <a href=/path/to/url.html>Continue</a> > <a href='/path/to/url.html'>Continue</a> > <a href="http://example.com/path/to/url.html" class="link">Continue</a> > <a style="font-size: 12px" href="http://example.com/path/to/url.html" > onlick="someFunction('foo','bar')">Continue</a> > > Please reply > > Your help is much appreciated. > > Thanks in advance, > Brad F. > preg_match_all('#<a[\s]+[^>]*href\s*=\s*[\"\']+([^ \"\']+?).+?>Continue</a>#i', $html, $matches); I just changed your regex a bit. What your regex was previously doing was matching everything from the first quote after the href= right up until the first > it found, which would usually be the one that closes the opening tag. You could make it a bit more intelligent if you wished with backreferencing to make sure it matches against the same type of quotation character it matched as the start of the href's value. Thanks, Ash http://www.ashleysheridan.co.uk