Re: Regex pattern for preg_match_all

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



@Simon,

Thanks for explaining about the [^href].  I need to read up more about
greediness.  I thought I understood it but guess not.

@Peter,

I tried your pattern but it didn't capture all of my new test cases.
Also, it captures the single/double quotes in addition to the
fragments inside the href.  I couldn't figure out how to modify your
pattern to exclude the ', ", and URL fragment from group 1 matches.

Below is the new pattern with the new sample test cases that I got it
to work.  The new pattern failed only 1 of the non-compliant.

$html = <<<HTML
<a href=/sample/link>content</a>
<a class=link href=/sample/link_extra_attribs title=sample
link>content link_extra_attribs</a>
<a href='/sample/link_single_quote'>content link_single_quote</a>
<a class='link' href='/sample/link_single_quote_pre_attribs'>content
link_single_quote_pre_attribs</a>
<a class='link' href='/sample/link_single_quote_extra_attribs'
title='sample link'>content link_single_quote_extra_attribs</a>
<a class='link'
href='/sample/link_single_quote_extra_attribs_frag#fragment'
title='sample link'>content
link_single_quote_extra_attribs_frag#fragment</a>
<a class='link'
href='/sample/link_single_quote_extra_attribs_query_frag?par=val#fragment'
title='sample link'>content
link_single_quote_extra_attribs_query_frag?par=val#fragment</a>
<a href="/sample/link_double_quote">content link_double_quote</a>
<a class="link" href="/sample/link_double_quote_pre_attribs">content
link_double_quote_pre_attribs</a>
<a class="link"
href="/sample/link_double_quote_extra_attribs_frag#fragment"
title="sample link">content
link_double_quote_extra_attribs_frag#fragment</a>
<a class="link"
href="/sample/link_double_quote_extra_attribs_nested_tag"
title="sample link"><img class="image" src="/images/content.jpg"
alt="content" title="content">
link_double_quote_extra_attribs_nested_tag</a>
<a href="#fragment">content fragment</a>
<a class="link" href="#fragment" title="sample link">content fragment</a>
<li class="small  tab "><a class="y-mast-link images"
href="http://images.search.yahoo.com/images";
data-b="http://www.yahoo.com";><span class="tab-cover y-mast-bg-hide"
style="padding-left:0em;padding-right:0em;">Images</span></a></li>
<li class="small  tab "><a class="y-mast-link video"
href="http://video.search.yahoo.com/video";
data-b="http://www.yahoo.com";><span class="tab-cover y-mast-bg-hide"
style="padding-left:0em;padding-right:0em;">Video</span></a></li>
<li class="small  tab "><a class="y-mast-link local"
href="http://local.yahoo.com/results";
data-b="http://www.yahoo.com";><span class="tab-cover y-mast-bg-hide"
style="padding-left:0em;padding-right:0em;">Local</span></a></li>
<li class="small  tab "><a class="y-mast-link shopping"
href="http://shopping.yahoo.com/search";
data-b="http://www.yahoo.com";><span class="tab-cover y-mast-bg-hide"
style="padding-left:0em;padding-right:0em;">Shopping</span></a></li>
<li class="small lasttab more-tab "><a class="y-mast-link more"
href="http://tools.search.yahoo.com/about/forsearchers.html"; ><span
class="tab-cover y-mast-bg-hide">More</span><span
class="y-fp-pg-controls arrow"></span></a></li>
HTML;

$pattern = '%<a[\s]+[^>]*?href\s*=\s*["\']?([^"\'#>]*)["\']?\s?[^>]*>(.*?)</a>%ims';

preg_match_all($pattern, $html, $matches);

Thanks for your time,
Tommy

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux