Re: Subject: Searching remote web sites for content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



At 06:26 23/10/2005, you wrote:
Message-ID: <8d9a42800510221021l54d3ba35y111666680ac3b643@xxxxxxxxxxxxxx>
Date: Sat, 22 Oct 2005 13:21:26 -0400
From: Joseph Crawford <codebowl@xxxxxxxxx>
To: " Mailing List" <php-db@xxxxxxxxxxxxx>
MIME-Version: 1.0
Content-Type: multipart/alternative;
        boundary="----=_Part_33359_9054580.1130001686839"
Subject: Re:  Re: Subject: Searching remote web sites for content

why do all that,

Oh, it's far less work than the method you're proposing - you only have one site to fopen() not many dozens. There's no 'all that' to it - it's the same method we're discussing, but more optimal (see point 3)

 if you know the address of the page that the link will
reside on just curl that page for the results and preg_match that.


Ref the OP : "I ask them to nominate where the link back page is, and I could check this manually. But is there a way to check whether the remote page links back using a php script, so that I could get a report and follow up on exceptions, without having to check all pages that say they link to my site?"

Three reasons : 1 is because the nomination process might be poorly understood by the nominee, or they could be inept and place the link somewhere other than where they specified (or move it about once nominated). You'd need to be able to crawl their entire site in order to automate the scan on a regular basis, or you're back to " and I could check this manually"

2 is that unless you want to write a very very robust parser, you may as well rely on google's hard work writing such a parser. You can't be sure *how* the referring webmaster has set up his links (re:inept) so they could occur in a wide range of formats. The results from google come in a regular format, so they're easy to parse - and you said yourself you're not too certain of the regex you'd need - why complicate it by having to cover dozens of eventualities ?

3 is that the point of the exercise is to ensure goos SE rankings by having referring links of high relevance. Only google knows how that relevance ranking results in a search index placement based on link popularity - and that includes using hidden links to 'spam' the search engine, whic you don't want.

So, relying on google to spider the remote site is a way to ensure your QA process for the link referrals really does result in a usable link:mysite index in the search engine - which of course is *the whole point of the exercise* !

HTH
Cheers - Neil
--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [PHP Users]     [Postgresql Discussion]     [Kernel Newbies]     [Postgresql]     [Yosemite News]

  Powered by Linux