Re: regex pattern for extracting URLs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Brad Fuller wrote:
> I'm looking for a regular expression to accomplish a specific task.
> 
> I'm hoping someone who's really good at regex patterns can lend a quick hand.
> 
> I need a regex pattern that will grab URLs out of HTML that have a
> certain link text. (i.e. the word "Continue")
> 
> This is what I have so far but it does not work properly (If there are
> other attributes in the <a> tag it returns them as part of the URL.)
> 
>     preg_match_all('#<a[\s]+[^>]*href\s*=\s*([\"\']+)([^>]+?)(\1|>)>Continue</a>#i',
> $html, $matches);
> 
> It needs to be able to extract the URL and disregard arbitrary
> attributes in the HTML tag
> 
> Test it with the following examples:
> 
> <a href=/path/to/url.html>Continue</a>
> <a href='/path/to/url.html'>Continue</a>
> <a href="http://example.com/path/to/url.html"; class="link">Continue</a>
> <a style="font-size: 12px" href="http://example.com/path/to/url.html";
> onlick="someFunction('foo','bar')">Continue</a>
> 
> Please reply
> 
> Your help is much appreciated.
> 
> Thanks in advance,
> Brad F.
> 

Looking at this document from an XML standpoint, I could see doing this rather
easily.  Without having to use regex.  You might look into using DomDocument and
simpleXML to complete the task.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux