this worked here: <?php $html = new DOMDocument(); $html->loadHtmlFile("testHtml.html"); $links = $html->getElementsByTagName('a'); echo "<pre>"; foreach ($links as $item) { echo $item->getAttribute( 'href' ). "\n"; echo "-------" . $item->nodeValue . "\n"; } echo "</pre>"; ?> Im sending u the 2 files directly in a minute. it came out, as I thought earlier that u have to check if the <a> tags has got children to extract image links. ralph_deffke@xxxxxxxx "chrysanhy" <phplists@xxxxxxxxxxxxxxxx> wrote in message news:88827b190908160943t2254137fve43771c7e4f8cc18@xxxxxxxxxxxxxxxxx > WHile waiting for suggestions for extracting the link text from the DOM, I > tried a brute force approach using the URLs I had found with getAttribute(), > but found myself baffled by my results. I boiled down my issue with this > approach to the following snippet. > > $htmldata =<<<EOB > http://www.protools.com/users/user_story.cfm?story_id=1162&lang=1">"Creating > > Surround Mixes with Tim Weidner</a>" <img height="11" > src="new.gif" width="28"> > - <i>Magnification</i> engineer talks about mixing the album at > the > <i>ProTools</i> site, by Jim Batchco > http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html">"Don't > Go" Video</a><a href=" > http://fi.soneraplaza.net/kaista/musiq/kaistatv/0,8883,201392,00.html"></a> > <img height="11" src="new.gif" width="28"> - Presented by Beyond > Music > (<a href="http://www.apple.com/quicktime/download/">QuickTime</a> > > Required) > EOB; > $url = 'http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html'; > $posn = strpos($url, $htmldata); > echo "URL |$url| position is |$posn|"; > > Running this gives me: > > URL |http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html| position is || > > I've tried lots of functions, and even regular expressions, but I cannot get > the code to find the URL in the HTML. While I still hope for a DOM solution > to getting this link text, WHY can't the code find the URL in the HTML > snippet? > > On Sun, Aug 16, 2009 at 9:29 AM, chrysanhy <phplists@xxxxxxxxxxxxxxxx>wrote: > > > I pasted the code exactly as you have it, and I got the following: > > > > *Fatal error*: Call to undefined method DOMElement::getContent() > > > > I got the same thing with nodeValue(). > > > > > > On Sun, Aug 16, 2009 at 7:35 AM, Ralph Deffke <ralph_deffke@xxxxxxxx>wrote: > > > >> did u try it something like this > >> > >> foreach ($links as $link) { > >> $int_url_list[$i]["href"] = $link->getAttribute( 'href' ); > >> $int_url_list[$i++]["linkText"] = $link->getContent( ); // > >> nodeValue(); > >> } > >> that should work > >> > >> send ur code then please > >> ralph_deffke@yahoo,de > >> > >> > >> "chrysanhy" <phplists@xxxxxxxxxxxxxxxx> wrote in message > >> news:88827b190908160033n226b370bqe2ab70732811b27@xxxxxxxxxxxxxxxxx > >> > I have the following code to extract the URLs from the anchor tags of an > >> > HTML page: > >> > > >> > $html = new DOMDocument(); > >> > $htmlpage->loadHtmlFile($location); > >> > $xpath = new DOMXPath($htmlpage); > >> > $links = $xpath->query( '//a' ); > >> > foreach ($links as $link) > >> > { $int_url_list[$i++] = $link->getAttribute( 'href' ) . "\n"; } > >> > > >> > If I have a link <a href="http://X.com">YYYY</a>, how do I extract the > >> > corresponding YYYY which is displayed to the user as the text of the > >> link > >> > (if it's an image tag, I would like a DOMElement for that). > >> > Thanks > >> > > >> > >> > >> > >> -- > >> PHP General Mailing List (http://www.php.net/) > >> To unsubscribe, visit: http://www.php.net/unsub.php > >> > >> > > > -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php