At 06:26 23/10/2005, you wrote:
Message-ID: <8d9a42800510221021l54d3ba35y111666680ac3b643@xxxxxxxxxxxxxx>
Date: Sat, 22 Oct 2005 13:21:26 -0400
From: Joseph Crawford <codebowl@xxxxxxxxx>
To: " Mailing List" <php-db@xxxxxxxxxxxxx>
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="----=_Part_33359_9054580.1130001686839"
Subject: Re: Re: Subject: Searching remote web sites for content
why do all that,
Oh, it's far less work than the method you're proposing - you only
have one site to fopen() not many dozens. There's no 'all that' to it
- it's the same method we're discussing, but more optimal (see point 3)
if you know the address of the page that the link will
reside on just curl that page for the results and preg_match that.
Ref the OP : "I ask them to nominate where the link back page is, and
I could check this manually. But is there a way to check whether the
remote page links back using a php script, so that I could get a
report and follow up on exceptions, without having to check all pages
that say they link to my site?"
Three reasons : 1 is because the nomination process might be poorly
understood by the nominee, or they could be inept and place the link
somewhere other than where they specified (or move it about once
nominated). You'd need to be able to crawl their entire site in order
to automate the scan on a regular basis, or you're back to " and I
could check this manually"
2 is that unless you want to write a very very robust parser, you may
as well rely on google's hard work writing such a parser. You can't
be sure *how* the referring webmaster has set up his links (re:inept)
so they could occur in a wide range of formats. The results from
google come in a regular format, so they're easy to parse - and you
said yourself you're not too certain of the regex you'd need - why
complicate it by having to cover dozens of eventualities ?
3 is that the point of the exercise is to ensure goos SE rankings by
having referring links of high relevance. Only google knows how that
relevance ranking results in a search index placement based on link
popularity - and that includes using hidden links to 'spam' the
search engine, whic you don't want.
So, relying on google to spider the remote site is a way to ensure
your QA process for the link referrals really does result in a usable
link:mysite index in the search engine - which of course is *the
whole point of the exercise* !
HTH
Cheers - Neil
--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php