Re: Subject: Searching remote web sites for content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



At 16:17 22/10/2005, you wrote:
Message-ID: <017701c5d71b$b3fd7bf0$6401a8c0@john7gg8ipktgf>
From: "ioannes" <ioannes@xxxxxxxxxxxxxx>
To: <php-db@xxxxxxxxxxxxx>
Date: Sat, 22 Oct 2005 16:17:22 +0100
MIME-Version: 1.0
Content-Type: text/plain;
        format=flowed;
        charset="iso-8859-1";
        reply-type=original
Content-Transfer-Encoding: 7bit
Subject: Searching remote web sites for content

I have a web site and google likes to count the inbound links. I have set up a way for people to add links from my site to theirs, however I would like to check whether they have linked back to my site. I ask them to nominate where the link back page is, and I could check this manually. But is there a way to check whether the remote page links back using a php script, so that I could get a report and follow up on exceptions, without having to check all pages that say they link to my site?

Yes, you can - exploit Google's search to do this.

You need to run a query for "link:mysite.mydomain.com" then screen-scrape the results. IE You'd curl or fopen() the pages with, for example

http://www.google.co.uk/search?q=link:www.captionkit.com&hl=en&lr=&start=10&sa=N

The for each page returned, use a regex to extract the HTML returned from Google, eg on

<p class=g><a href="http://archive.netbsd.se/?ml=php-database&a=2004-10&m=430433"; onmousedown="return clk(this.href,'res','18','')">archive.netbsd.se - NetBSD Sverige</a>

You just want a capture pattern to extract the href value, which you then store in your database. Before you accuse anybody of anything, ensure you've waited a few days for google to re-spider their site. If their site doesn't appear in the index at all, it may be because google doesn't or can't spider it, rather than the back link isn't there - but in that case their link popularity is ineffectual and may as well be ignored !

HTH
Cheers - Neil

--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [PHP Users]     [Postgresql Discussion]     [Kernel Newbies]     [Postgresql]     [Yosemite News]

  Powered by Linux