Shawn McKenzie wrote: > fschnittke@xxxxxxxxxxxxx wrote: >> Hi: >> >> Newbie here. This is my first attempt at PHP scripting. I'm trying to find >> an alternative to Lotus Domino's domlog.nsf for logging web transactions. >> Domino does create an Apache compatible text file of the web transactions, >> and this is what I’m trying to parse. I started off using a code snibbet I >> found on the web. I modified it a little bit to suit my needs. It was >> working fine with the small 600k test log file I was using, but since I’ve >> moved to the larger 18Mb production log file here’s what happens: >> >> I’ve modified the code and added an echo statement to echo each loop that >> gets processed. Initially it starts off very fast but then performance >> becomes very slow, to a point where I can count each loop as it’s being >> processed. It’s taking a little over 3 hours to parse the entire file. I >> figured it was a disk cache thing, so I created a ram drive. This has >> improved the performance, but is still taking an hour to parse. >> >> Here is the PHP script I’m using: >> >> >> <?php >> > Why read in an array and then implode it to a string, then split it into > an array? Just use file_get_contents() and split it or use file() and > then do your preg_replace("/(\r|\t)/", on the array). >> $ac_arr = file('access_log'); >> $astring = join("", $ac_arr); >> $astring = preg_replace("/(\r|\t)/", "", $astring); >> $records = preg_split("/(\n)/", $astring, -1, PREG_SPLIT_NO_EMPTY); >> >> $sizerecs = sizeof($records); >> >> // now split into records >> $i = 1; >> $each_rec = 0; >> > Why not foreach($records as $all) ? >> while($i<$sizerecs) { >> $all = $records[$i]; >> > All of these $all = str_replace() and othe str_replace() are probably > killing you. Rethink a way where you extract the data instead of > finding it and then replacing it. >> // IP Address ($IP): >> $IP = substr($all, 0, strpos($all, " ")); >> $all = str_replace($IP, "", $all); >> >> //Remote User ($RU): >> $string = substr($all, 0, strpos($all, " [")); // www.vpcl.on.ca T123 >> $sstring = substr($string, strpos($string, " ")+1); >> $AUstring = substr($sstring, strpos($sstring, " ")); >> $RU = preg_replace("/\"/", "", $AUstring); >> $RU = trim($RU); >> $all = str_replace($string, "", $all); >> >> //Request Time Stamp ($RTS): >> preg_match("/\[(.+)\]/", $all, $match); >> $RTS = $match[1]; >> $all = str_replace(" [$RTS] \"", "", $all); >> >> //Http Request Line ($HRL): >> $string = substr($all, 0, strpos($all, "\"")+2); >> $HRL = str_replace("\"", "", $string); >> $all = str_replace($string, "", $all); >> >> //Http Response Status Code (HRSC): >> $HRSC = trim(substr($all, 0, strpos($all, " ")+1)); >> $all = str_replace($HRSC, "", $all); >> >> //Request Content Length (RCL): >> $string = substr($all, 0, strpos($all, "\"")+1); >> $RCL = trim(str_replace("\"", "", $string)); >> $all = str_replace($string, "", $all); >> >> //Referring URL (RefU): >> $string = substr($all, 0, strpos($all, "\"")+3); >> $RefU = substr($all, 0, strpos($all, "\"")); >> $all = str_replace($string, "", $all); >> >> //User Agent (UA): >> $string = substr($all, 0, strpos($all, "\"")+2); >> $UA = substr($all, 0, strpos($all, "\"")); >> $all = str_replace($string, "", $all); >> >> //Time to Process Request: >> >> #$new_format[$each_rec] = "$UA\n"; >> $new_format[$each_rec] = >> "$IP\t$RU\t$RTS\t$HRL\t$HRSC\t$RCL\t$RefU\t$UA\t$all\n"; >> > Each time through the above loop you add a $new_format[$each_rec] and > then here you are looping through each one of those. I think if you > just move this to the end it will make a drastic improvement. >> $fhandle = fopen("/ramdrive/import_file.txt", "w"); >> foreach($new_format as $data) { >> fputs($fhandle, "$data"); >> } >> fclose($fhandle); >> >> // advance to next record >> echo "$i\n"; >> $i = $i + 1; >> >> $each_rec++; >> } > $fhandle = fopen("/ramdrive/import_file.txt", "w"); > foreach($new_format as $data) { > fputs($fhandle, "$data"); > } > fclose($fhandle); > >> ?> >> >> >> This is running on a Toshiba Tecra A4 Laptop with FreeBSD 7.0 Release. >> Plenty of RAM and HDD space. The PHP Version is: >> >> PHP 5.2.5 with Suhosin-Patch 0.9.6.2 (cli) (built: Feb 11 2009 09:28:47) >> Copyright (c) 1997-2007 The PHP Group >> Zend Engine v2.2.0, Copyright (c) 1998-2007 Zend Technologies >> >> What should I do to get this script to run faster? >> >> Any help is appreciated…. >> >> Regards, >> >> >> >> Fred Schnittke >> >> >> ---------------------------- >> Powered by Execulink Webmail >> http://www.execulink.com/ >> > > I see that Paul replied and we say the same things, so here are the two best approaches to speed it up ignoring all the str_replaces(), etc. that need to be gotten rid of: // option 1 - i would assume this to be the most efficient / fastest $ac_arr = file('access_log'); $records = preg_replace("/(\r|\t)/", "", $ac_arr); $fhandle = fopen("/ramdrive/import_file.txt", "w"); foreach($records as $all) { // manipulate your data // if you actually need the array then assign it here // $new_format[] = // "$IP\t$RU\t$RTS\t$HRL\t$HRSC\t$RCL\t$RefU\t$UA\t$all\n" // else just do this fputs($fhandle, "$IP\t$RU\t$RTS\t$HRL\t$HRSC\t$RCL\t$RefU\t$UA\t$all\n"); } fclose($fhandle); // option 2 - if you don't need the array, just create a large string $ac_arr = file('access_log'); $records = preg_replace("/(\r|\t)/", "", $ac_arr); foreach($records as $data) { // manipulate your data $new_format .= "$IP\t$RU\t$RTS\t$HRL\t$HRSC\t$RCL\t$RefU\t$UA\t$all\n"; } file_put_contents("/ramdrive/import_file.txt", $new_format); -- Thanks! -Shawn http://www.spidean.com -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php