Re: Re: How do I extract link text from anchor tag as well as the URL from the "href" attribute

chrysanhy <phplists@xxxxxxxxxxxxxxxx> · Sun, 16 Aug 2009 09:43:35 -0700

WHile waiting for suggestions for extracting the link text from the DOM, I
tried a brute force approach using the URLs I had found with getAttribute(),
but found myself baffled by my results. I boiled down my issue with this
approach to the following snippet.

$htmldata =<<<EOB
http://www.protools.com/users/user_story.cfm?story_id=1162&amp;lang=1";>&quot;Creating

            Surround Mixes with Tim Weidner</a>&quot; <img height="11"
src="new.gif" width="28">
            - <i>Magnification</i> engineer talks about mixing the album at
the
            <i>ProTools</i> site, by Jim Batchco
http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html";>&quot;Don't
            Go&quot; Video</a><a href="
http://fi.soneraplaza.net/kaista/musiq/kaistatv/0,8883,201392,00.html";></a>
            <img height="11" src="new.gif" width="28"> - Presented by Beyond
Music
            (<a href="http://www.apple.com/quicktime/download/";>QuickTime</a>

            Required)
EOB;
$url = 'http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html';
$posn = strpos($url, $htmldata);
echo "URL |$url| position is |$posn|";

Running this gives me:

URL |http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html| position is ||

I've tried lots of functions, and even regular expressions, but I cannot get
the code to find the URL in the HTML. While I still hope for a DOM solution
to getting this link text, WHY can't the code find the URL in the HTML
snippet?

On Sun, Aug 16, 2009 at 9:29 AM, chrysanhy <phplists@xxxxxxxxxxxxxxxx>wrote:

> I pasted the code exactly as you have it, and I got the following:
>
> *Fatal error*: Call to undefined method DOMElement::getContent()
>
> I got the same thing with nodeValue().
>
>
> On Sun, Aug 16, 2009 at 7:35 AM, Ralph Deffke <ralph_deffke@xxxxxxxx>wrote:
>
>> did u try it something like this
>>
>> foreach ($links as $link) {
>>    $int_url_list[$i]["href"] = $link->getAttribute( 'href' );
>>    $int_url_list[$i++]["linkText"] = $link->getContent(  ); //
>> nodeValue();
>> }
>> that should work
>>
>> send ur code then please
>> ralph_deffke@yahoo,de
>>
>>
>> "chrysanhy" <phplists@xxxxxxxxxxxxxxxx> wrote in message
>> news:88827b190908160033n226b370bqe2ab70732811b27@xxxxxxxxxxxxxxxxx
>> > I have the following code to extract the URLs from the anchor tags of an
>> > HTML page:
>> >
>> > $html = new DOMDocument();
>> > $htmlpage->loadHtmlFile($location);
>> > $xpath = new DOMXPath($htmlpage);
>> > $links = $xpath->query( '//a' );
>> > foreach ($links as $link)
>> > { $int_url_list[$i++] = $link->getAttribute( 'href' ) . "\n"; }
>> >
>> > If I have a link <a href="http://X.com";>YYYY</a>, how do I extract the
>> > corresponding YYYY which is displayed to the user as the text of the
>> link
>> > (if it's an image tag, I would like a DOMElement for that).
>> > Thanks
>> >
>>
>>
>>
>> --
>> PHP General Mailing List (http://www.php.net/)
>> To unsubscribe, visit: http://www.php.net/unsub.php
>>
>>
>