At 16:17 22/10/2005, you wrote:
Message-ID: <017701c5d71b$b3fd7bf0$6401a8c0@john7gg8ipktgf>
From: "ioannes" <ioannes@xxxxxxxxxxxxxx>
To: <php-db@xxxxxxxxxxxxx>
Date: Sat, 22 Oct 2005 16:17:22 +0100
MIME-Version: 1.0
Content-Type: text/plain;
format=flowed;
charset="iso-8859-1";
reply-type=original
Content-Transfer-Encoding: 7bit
Subject: Searching remote web sites for content
I have a web site and google likes to count the inbound links. I
have set up a way for people to add links from my site to theirs,
however I would like to check whether they have linked back to my
site. I ask them to nominate where the link back page is, and I
could check this manually. But is there a way to check whether the
remote page links back using a php script, so that I could get a
report and follow up on exceptions, without having to check all
pages that say they link to my site?
Yes, you can - exploit Google's search to do this.
You need to run a query for "link:mysite.mydomain.com" then
screen-scrape the results. IE You'd curl or fopen() the pages with, for example
http://www.google.co.uk/search?q=link:www.captionkit.com&hl=en&lr=&start=10&sa=N
The for each page returned, use a regex to extract the HTML returned
from Google, eg on
<p class=g><a
href="http://archive.netbsd.se/?ml=php-database&a=2004-10&m=430433"
onmousedown="return clk(this.href,'res','18','')">archive.netbsd.se -
NetBSD Sverige</a>
You just want a capture pattern to extract the href value, which you
then store in your database. Before you accuse anybody of anything,
ensure you've waited a few days for google to re-spider their site.
If their site doesn't appear in the index at all, it may be because
google doesn't or can't spider it, rather than the back link isn't
there - but in that case their link popularity is ineffectual and may
as well be ignored !
HTH
Cheers - Neil
--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php