On 18 February 2011 22:36, Tommy Pham <tommyhp2@xxxxxxxxx> wrote: > Hi folks, > > This is not directly relating to PHP but it's Friday so I'm gonna give > it a shot :). ÂWould someone please help me figure out why my regex > pattern doesn't work. ÂBelow is the code and sample data: > > $html = <<<HTML > <li class="small Âtab "><a class="y-mast-link images" > href="http://images.search.yahoo.com/images" > data-b="http://www.yahoo.com"><span class="tab-cover y-mast-bg-hide" > style="padding-left:0em;padding-right:0em;">Images</span></a></li> > <li class="small Âtab "><a class="y-mast-link video" > href="http://video.search.yahoo.com/video" > data-b="http://www.yahoo.com"><span class="tab-cover y-mast-bg-hide" > style="padding-left:0em;padding-right:0em;">Video</span></a></li> > <li class="small Âtab "><a class="y-mast-link local" > href="http://local.yahoo.com/results" > data-b="http://www.yahoo.com"><span class="tab-cover y-mast-bg-hide" > style="padding-left:0em;padding-right:0em;">Local</span></a></li> > <li class="small Âtab "><a class="y-mast-link shopping" > href="http://shopping.yahoo.com/search" > data-b="http://www.yahoo.com"><span class="tab-cover y-mast-bg-hide" > style="padding-left:0em;padding-right:0em;">Shopping</span></a></li> > <li class="small lasttab more-tab "><a class="y-mast-link more" > href="http://tools.search.yahoo.com/about/forsearchers.html" ><span > class="tab-cover y-mast-bg-hide">More</span><span > class="y-fp-pg-controls arrow"></span></a></li> > HTML; > > $pattern = '%<a\s[^href]*href\s*=\s*[\'|"]?([^\'|"|#]+)[\'|"]?\s*[^>]*>(.*)?</a>%im'; > preg_match_all($pattern, $html, $matches); > > The only matches I got is: > > Match 1 of 1: Â <a class="y-mast-link local" > href="http://local.yahoo.com/results" > data-b="http://www.yahoo.com"><span class="tab-cover y-mast-bg-hide" > style="padding-left:0em;padding-right:0em;">Local</span></a> > > Group 1: Â Â Â Âhttp://local.yahoo.com/results > > Group 2: Â Â Â Â<span class="tab-cover y-mast-bg-hide" > style="padding-left:0em;padding-right:0em;">Local</span> > > The pattern I made was to work in cases where the page is > non-compliant to any of standard W3. > Not entirely sure what your input data is, as I'm guessing one or more mail programs may have added line breaks. When I run the code I get no matches at all - so I'm guessing you might have different input on your end. More specifically, I'm also guessing you have line breaks on your end, but not equally distributed - which would explain the one hit. Apart from that, there are a couple of things I'd rework in your regex: %<a\s+.*?(?!href)\s+href\s*=\s*([^\s\'"]+|\'[^\']+\'|\"[^\"]+\")[^>]*>(.*?)</a>%ims * added modifier to whitespace at first * allowing for any character not followed by href (non-greedy) * match the href * use proper alternation * capture anything inside the <a> tag, non-greedy * match with a closing </a> tag Results: array(3) { [0]=> array(5) { [0]=> string(205) "<a class="y-mast-link images" href="http://images.search.yahoo.com/images" data-b="http://www.yahoo.com"><span class="tab-cover y-mast-bg-hide" style="padding-left:0em;padding-right:0em;">Images</span></a>" [1]=> string(201) "<a class="y-mast-link video" href="http://video.search.yahoo.com/video" data-b="http://www.yahoo.com"><span class="tab-cover y-mast-bg-hide" style="padding-left:0em;padding-right:0em;">Video</span></a>" [2]=> string(196) "<a class="y-mast-link local" href="http://local.yahoo.com/results" data-b="http://www.yahoo.com"><span class="tab-cover y-mast-bg-hide" style="padding-left:0em;padding-right:0em;">Local</span></a>" [3]=> string(204) "<a class="y-mast-link shopping" href="http://shopping.yahoo.com/search" data-b="http://www.yahoo.com"><span class="tab-cover y-mast-bg-hide" style="padding-left:0em;padding-right:0em;">Shopping</span></a>" [4]=> string(188) "<a class="y-mast-link more" href="http://tools.search.yahoo.com/about/forsearchers.html" ><span class="tab-cover y-mast-bg-hide">More</span><span class="y-fp-pg-controls arrow"></span></a>" } [1]=> array(5) { [0]=> string(39) ""http://images.search.yahoo.com/images"" [1]=> string(37) ""http://video.search.yahoo.com/video"" [2]=> string(32) ""http://local.yahoo.com/results"" [3]=> string(34) ""http://shopping.yahoo.com/search"" [4]=> string(55) ""http://tools.search.yahoo.com/about/forsearchers.html"" } [2]=> array(5) { [0]=> string(96) "<span class="tab-cover y-mast-bg-hide" style="padding-left:0em;padding-right:0em;">Images</span>" [1]=> string(95) "<span class="tab-cover y-mast-bg-hide" style="padding-left:0em;padding-right:0em;">Video</span>" [2]=> string(95) "<span class="tab-cover y-mast-bg-hide" style="padding-left:0em;padding-right:0em;">Local</span>" [3]=> string(98) "<span class="tab-cover y-mast-bg-hide" style="padding-left:0em;padding-right:0em;">Shopping</span>" [4]=> string(94) "<span class="tab-cover y-mast-bg-hide">More</span><span class="y-fp-pg-controls arrow"></span>" } -- <hype> WWW: plphp.dk / plind.dk LinkedIn: plind BeWelcome/Couchsurfing: Fake51 Twitter: kafe15 </hype> -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php