Hi all. I am trying to run some PHP code I got from the O'Reilly book
Spidering Hacks and it doesn't seem to be working. Please note, the
functions were not inculded in the code, I had to copy it from the book so I
might have made mistake. I thought perhaps an experienced PHP programmer
might be able to pick the error. Does the code work for anyone else?
Thank you. Poppy
The error, and code are listed below:
phpserver$ php -q book-1.php
Parse error: parse error in book-1.php on line 90
phpserver$
#!/usr/bin/php -q
<?php
/* include the scraping functions script:
include( "scrape_func.php" );
I commented this out and included in this file for clarity
*/
function getURL( $pURL ) {
$_data = null;
if( $_http = fopen( $pURL, "r" ) ) {
while( !feof( $_http ) ) {
$_data .= fgets( $_http, 1024 );
}
fclose( $_http );
}
return( $_data );
}
function cleanString( $pString ) {
$_data = str_replace( array( chr(10), chr(13), chr(9) ), chr(32), $pString
);
while( strpos( $_data, str_repeat( chr(32), 2 ), 0 ) != false ) {
$_data = str_replace( str_repeat( chr(32), 2 ), chr(32), $_data );
}
return( trim( $_data ) );
}
function getBlock( $pStart, $pStop, $pSource, $pPrefix = true ) {
$_data = null;
$_start = strpos( strtolower( $pSource ), strtolower( $pStart ), 0 );
$_start = ( $pPrefix == false ) ? $_start + strlen( $pStart ) : $_start;
$_stop = strpos( strtolower( $pSource ), strtolower( $pStop ), $_start );
if( $_start > strlen( $pElement ) && $_stop > $_start ) {
$_data = trim( substr( $pSource, $_start, $_stop - $_start ) );
}
return( $_data );
function getElement( $pElement, $pSource ) {
$_data = null;
$pElement = strtolower( $pElement );
$_start = strpos( strtolower( $pSource ), chr(60) . $pElement, 0 );
$_start = strpos( $pSource, chr(62), $_start ) + 1;
$_stop = strpos( strtolower( $pSource ), "</" . $pElement . chr(62),
$_start );
if( $_start > strlen( $pElement ) && $_stop > $_start ) {
$_data = trim( substr( $pSource, $_start, $_stop - $_start ) );
}
return( $_data );
}
/* Next, we'll get the raw source code of
the page using our getURL( ) function: */
$_rawData = getURL( "http://www.techdeals.net/" );
/* And clean up the raw source for easier parsing: */
$_rawData = cleanString( $_rawData );
/* The next step is a little more complex. Because we've already
looked at the HTML source, we know that the items start and
end with two particular strings. We'll use these strings to
get the main data portion of the page:*/
$_rawData = getBlock( "<div class=\"NewsHeader\">",
"</div> <div id=\"MenuContainer\">", $_rawData );
/* We now have the particular data that we want to parse into
an itemized list. We do that by breaking the code into an
array so we can loop through each item: */
$_rawData = explode( "<div class=\"NewsHeader\">", $_rawData );
/* While iterating through each value, we
parse out the individual item portions: */
foreach( $_rawData as $_rawBlock ) {
$_item = array( );
$_rawBlock = trim( $_rawBlock );
if( strlen( $_rawBlock ) > 0 ) {
/* The title of the item can be found in <h2> ... </h2> tags */
$_item[ "title" ] = strip_tags( getElement( "h2", $_rawBlock ) );
/* The link URL can is found between
http://www.techdeals.net/rd/go.php?id= and " */
$_item[ "link" ] = getBlock( "http://www.techdeals.net/rd/go.php?id=",
chr(34), $_rawBlock );
/* Posting info is in <span> ... </span> tags */
$_item[ "post" ] = strip_tags( getElement( "span", $_rawBlock ) );
/* The description is found between an </div> and a <img tag */
$_item[ "desc" ] = cleanString( strip_tags( getBlock( "</div>",
"<img", $_rawBlock ) ) );
/* Some descriptions are slightly different,
so we need to clean them up a bit */
if( strpos( $_item[ "desc" ], "Click here for the techdeal", 0 ) > 0 )
{
$_marker = strpos( $_item[ "desc" ], "Click here for the techdeal",
0 );
$_item[ "desc" ] = trim( substr( $_item[ "desc" ], 0, $_marker ) );
}
/* Print out the scraped data */
print( implode( chr(10), $_item ) . chr(10) . chr(10) );
/* Save the data as a string (used in the mail example below) */
$_text .= implode( chr(10), $_item ) . chr(10) . chr(10);
}
}
?>
_________________________________________________________________
Ever wanted to be a TV Stylist? Win Your Dream Job at MyCareer
http://dreamjob.mycareer.com.au/?s_cid=213322
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php