Book Sample Code Help

"Poppy Alexandra" <poppy334455@xxxxxxxxxxx> · Tue, 24 Jan 2006 21:54:26 +1000

Hi all. I am trying to run some PHP code I got from the O'Reilly book 
Spidering Hacks and it doesn't seem to be working. Please note, the 
functions were not inculded in the code, I had to copy it from the book so I 
might have made mistake. I thought perhaps an experienced PHP programmer 
might be able to pick the error. Does the code work for anyone else?

Thank you. Poppy

The error, and code are listed below:

phpserver$ php -q book-1.php
Parse error: parse error in book-1.php on line 90
phpserver$

#!/usr/bin/php -q

<?php

/* include the scraping functions script:
include( "scrape_func.php" );
I commented this out and included in this file for clarity
*/

function getURL( $pURL ) {
	$_data = null;
	if( $_http = fopen( $pURL, "r" ) ) {
		while( !feof( $_http ) ) {
			$_data .= fgets( $_http, 1024 );
		}
		fclose( $_http );
	}
	return( $_data );
}

function cleanString( $pString ) {
	$_data = str_replace( array( chr(10), chr(13), chr(9) ), chr(32), $pString 
);
		while( strpos( $_data, str_repeat( chr(32), 2 ), 0 ) != false ) {
			$_data = str_replace( str_repeat( chr(32), 2 ), chr(32), $_data );
		}
		return( trim( $_data ) );
}

function getBlock( $pStart, $pStop, $pSource, $pPrefix = true ) {
	$_data = null;
	$_start = strpos( strtolower( $pSource ), strtolower( $pStart ), 0 );
	$_start = ( $pPrefix == false ) ? $_start + strlen( $pStart ) : $_start;
	$_stop = strpos( strtolower( $pSource ), strtolower( $pStop ), $_start );
	if( $_start > strlen( $pElement ) && $_stop > $_start ) {
		$_data = trim( substr( $pSource, $_start, $_stop - $_start ) );
	}
	return( $_data );

function getElement( $pElement, $pSource ) {
	$_data = null;
	$pElement = strtolower( $pElement );
	$_start = strpos( strtolower( $pSource ), chr(60) . $pElement, 0 );
	$_start = strpos( $pSource, chr(62), $_start ) + 1;
	$_stop = strpos( strtolower( $pSource ), "</" . $pElement . chr(62), 
$_start );
	if( $_start > strlen( $pElement ) && $_stop > $_start ) {
		$_data = trim( substr( $pSource, $_start, $_stop - $_start ) );
	}
	return( $_data );
}

/* Next, we'll get the raw source code of
  the page using our getURL(  ) function:  */
$_rawData = getURL( "http://www.techdeals.net/"; );

/* And clean up the raw source for easier parsing:  */
$_rawData = cleanString( $_rawData );

/* The next step is a little more complex. Because we've already
  looked at the HTML source, we know that the items start and
  end with two particular strings. We'll use these strings to
  get the main data portion of the page:*/
$_rawData = getBlock( "<div class=\"NewsHeader\">",
                     "</div> <div id=\"MenuContainer\">", $_rawData );

/* We now have the particular data that we want to parse into
  an itemized list. We do that by breaking the code into an
  array so we can loop through each item: */
$_rawData = explode( "<div class=\"NewsHeader\">", $_rawData );

/* While iterating through each value, we
  parse out the individual item portions:  */

foreach( $_rawData as $_rawBlock ) {
  $_item = array(  );
  $_rawBlock = trim( $_rawBlock );
  if( strlen( $_rawBlock ) > 0 ) {

     /*   The title of the item can be found in <h2> ... </h2> tags   */
     $_item[ "title" ] = strip_tags( getElement( "h2", $_rawBlock ) );

     /*   The link URL can is found between
          http://www.techdeals.net/rd/go.php?id= and "   */
     $_item[ "link" ] = getBlock( "http://www.techdeals.net/rd/go.php?id=";,
                                  chr(34), $_rawBlock );

     /*   Posting info is in <span> ... </span> tags   */
     $_item[ "post" ] = strip_tags( getElement( "span", $_rawBlock ) );

     /*   The description is found between an </div> and a <img tag   */
     $_item[ "desc" ] = cleanString( strip_tags( getBlock( "</div>",
                                     "<img", $_rawBlock ) ) );

     /*   Some descriptions are slightly different,
          so we need to clean them up a bit   */
     if( strpos( $_item[ "desc" ], "Click here for the techdeal", 0 ) > 0 ) 
{
        $_marker = strpos( $_item[ "desc" ], "Click here for the techdeal", 
0 );
        $_item[ "desc" ] = trim( substr( $_item[ "desc" ], 0, $_marker ) );
     }

     /*   Print out the scraped data   */
     print( implode( chr(10), $_item ) . chr(10) . chr(10) );

     /*   Save the data as a string (used in the mail example below)   */
     $_text .= implode( chr(10), $_item ) . chr(10) . chr(10);
  }
}

?>

_________________________________________________________________
Ever wanted to be a TV Stylist? Win Your Dream Job at MyCareer 
http://dreamjob.mycareer.com.au/?s_cid=213322

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php