Re: Re: How do I extract link text from anchor tag as well as the URL from the "href" attribute

"Ralph Deffke" <ralph_deffke@xxxxxxxx> · Sun, 16 Aug 2009 20:26:08 +0200

this worked here:
<?php

$html = new DOMDocument();
$html->loadHtmlFile("testHtml.html");
$links = $html->getElementsByTagName('a');
echo "<pre>";

foreach ($links as $item) {
  echo $item->getAttribute( 'href' ). "\n";
  echo "-------" . $item->nodeValue . "\n";
}

echo "</pre>";

?>

Im sending u the 2 files directly in a minute. it came out, as I thought
earlier that u have to check if the <a> tags has got children to extract
image links.

ralph_deffke@xxxxxxxx

"chrysanhy" <phplists@xxxxxxxxxxxxxxxx> wrote in message
news:88827b190908160943t2254137fve43771c7e4f8cc18@xxxxxxxxxxxxxxxxx
> WHile waiting for suggestions for extracting the link text from the DOM, I
> tried a brute force approach using the URLs I had found with
getAttribute(),
> but found myself baffled by my results. I boiled down my issue with this
> approach to the following snippet.
>
> $htmldata =<<<EOB
>
http://www.protools.com/users/user_story.cfm?story_id=1162&amp;lang=1";>&quot;Creating
>
>             Surround Mixes with Tim Weidner</a>&quot; <img height="11"
> src="new.gif" width="28">
>             - <i>Magnification</i> engineer talks about mixing the album
at
> the
>             <i>ProTools</i> site, by Jim Batchco
> http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html";>&quot;Don't
>             Go&quot; Video</a><a href="
>
http://fi.soneraplaza.net/kaista/musiq/kaistatv/0,8883,201392,00.html";></a>
>             <img height="11" src="new.gif" width="28"> - Presented by
Beyond
> Music
>             (<a
href="http://www.apple.com/quicktime/download/";>QuickTime</a>
>
>             Required)
> EOB;
> $url = 'http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html';
> $posn = strpos($url, $htmldata);
> echo "URL |$url| position is |$posn|";
>
> Running this gives me:
>
> URL |http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html| position is
||
>
> I've tried lots of functions, and even regular expressions, but I cannot
get
> the code to find the URL in the HTML. While I still hope for a DOM
solution
> to getting this link text, WHY can't the code find the URL in the HTML
> snippet?
>
> On Sun, Aug 16, 2009 at 9:29 AM, chrysanhy
<phplists@xxxxxxxxxxxxxxxx>wrote:
>
> > I pasted the code exactly as you have it, and I got the following:
> >
> > *Fatal error*: Call to undefined method DOMElement::getContent()
> >
> > I got the same thing with nodeValue().
> >
> >
> > On Sun, Aug 16, 2009 at 7:35 AM, Ralph Deffke
<ralph_deffke@xxxxxxxx>wrote:
> >
> >> did u try it something like this
> >>
> >> foreach ($links as $link) {
> >>    $int_url_list[$i]["href"] = $link->getAttribute( 'href' );
> >>    $int_url_list[$i++]["linkText"] = $link->getContent(  ); //
> >> nodeValue();
> >> }
> >> that should work
> >>
> >> send ur code then please
> >> ralph_deffke@yahoo,de
> >>
> >>
> >> "chrysanhy" <phplists@xxxxxxxxxxxxxxxx> wrote in message
> >> news:88827b190908160033n226b370bqe2ab70732811b27@xxxxxxxxxxxxxxxxx
> >> > I have the following code to extract the URLs from the anchor tags of
an
> >> > HTML page:
> >> >
> >> > $html = new DOMDocument();
> >> > $htmlpage->loadHtmlFile($location);
> >> > $xpath = new DOMXPath($htmlpage);
> >> > $links = $xpath->query( '//a' );
> >> > foreach ($links as $link)
> >> > { $int_url_list[$i++] = $link->getAttribute( 'href' ) . "\n"; }
> >> >
> >> > If I have a link <a href="http://X.com";>YYYY</a>, how do I extract
the
> >> > corresponding YYYY which is displayed to the user as the text of the
> >> link
> >> > (if it's an image tag, I would like a DOMElement for that).
> >> > Thanks
> >> >
> >>
> >>
> >>
> >> --
> >> PHP General Mailing List (http://www.php.net/)
> >> To unsubscribe, visit: http://www.php.net/unsub.php
> >>
> >>
> >
>

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php