fschnittke@xxxxxxxxxxxxx wrote: > Hi: > > Newbie here. This is my first attempt at PHP scripting. I'm trying to find > an alternative to Lotus Domino's domlog.nsf for logging web transactions. > Domino does create an Apache compatible text file of the web transactions, > and this is what I’m trying to parse. I started off using a code snibbet I > found on the web. I modified it a little bit to suit my needs. It was > working fine with the small 600k test log file I was using, but since I’ve > moved to the larger 18Mb production log file here’s what happens: > > I’ve modified the code and added an echo statement to echo each loop that > gets processed. Initially it starts off very fast but then performance > becomes very slow, to a point where I can count each loop as it’s being > processed. It’s taking a little over 3 hours to parse the entire file. I > figured it was a disk cache thing, so I created a ram drive. This has > improved the performance, but is still taking an hour to parse. > > Here is the PHP script I’m using: > > > <?php > Why read in an array and then implode it to a string, then split it into an array? Just use file_get_contents() and split it or use file() and then do your preg_replace("/(\r|\t)/", on the array). > $ac_arr = file('access_log'); > $astring = join("", $ac_arr); > $astring = preg_replace("/(\r|\t)/", "", $astring); > $records = preg_split("/(\n)/", $astring, -1, PREG_SPLIT_NO_EMPTY); > > $sizerecs = sizeof($records); > > // now split into records > $i = 1; > $each_rec = 0; > Why not foreach($records as $all) ? > while($i<$sizerecs) { > $all = $records[$i]; > All of these $all = str_replace() and othe str_replace() are probably killing you. Rethink a way where you extract the data instead of finding it and then replacing it. > // IP Address ($IP): > $IP = substr($all, 0, strpos($all, " ")); > $all = str_replace($IP, "", $all); > > //Remote User ($RU): > $string = substr($all, 0, strpos($all, " [")); // www.vpcl.on.ca T123 > $sstring = substr($string, strpos($string, " ")+1); > $AUstring = substr($sstring, strpos($sstring, " ")); > $RU = preg_replace("/\"/", "", $AUstring); > $RU = trim($RU); > $all = str_replace($string, "", $all); > > //Request Time Stamp ($RTS): > preg_match("/\[(.+)\]/", $all, $match); > $RTS = $match[1]; > $all = str_replace(" [$RTS] \"", "", $all); > > //Http Request Line ($HRL): > $string = substr($all, 0, strpos($all, "\"")+2); > $HRL = str_replace("\"", "", $string); > $all = str_replace($string, "", $all); > > //Http Response Status Code (HRSC): > $HRSC = trim(substr($all, 0, strpos($all, " ")+1)); > $all = str_replace($HRSC, "", $all); > > //Request Content Length (RCL): > $string = substr($all, 0, strpos($all, "\"")+1); > $RCL = trim(str_replace("\"", "", $string)); > $all = str_replace($string, "", $all); > > //Referring URL (RefU): > $string = substr($all, 0, strpos($all, "\"")+3); > $RefU = substr($all, 0, strpos($all, "\"")); > $all = str_replace($string, "", $all); > > //User Agent (UA): > $string = substr($all, 0, strpos($all, "\"")+2); > $UA = substr($all, 0, strpos($all, "\"")); > $all = str_replace($string, "", $all); > > //Time to Process Request: > > #$new_format[$each_rec] = "$UA\n"; > $new_format[$each_rec] = > "$IP\t$RU\t$RTS\t$HRL\t$HRSC\t$RCL\t$RefU\t$UA\t$all\n"; > Each time through the above loop you add a $new_format[$each_rec] and then here you are looping through each one of those. I think if you just move this to the end it will make a drastic improvement. > $fhandle = fopen("/ramdrive/import_file.txt", "w"); > foreach($new_format as $data) { > fputs($fhandle, "$data"); > } > fclose($fhandle); > > // advance to next record > echo "$i\n"; > $i = $i + 1; > > $each_rec++; > } $fhandle = fopen("/ramdrive/import_file.txt", "w"); foreach($new_format as $data) { fputs($fhandle, "$data"); } fclose($fhandle); > ?> > > > This is running on a Toshiba Tecra A4 Laptop with FreeBSD 7.0 Release. > Plenty of RAM and HDD space. The PHP Version is: > > PHP 5.2.5 with Suhosin-Patch 0.9.6.2 (cli) (built: Feb 11 2009 09:28:47) > Copyright (c) 1997-2007 The PHP Group > Zend Engine v2.2.0, Copyright (c) 1998-2007 Zend Technologies > > What should I do to get this script to run faster? > > Any help is appreciated…. > > Regards, > > > > Fred Schnittke > > > ---------------------------- > Powered by Execulink Webmail > http://www.execulink.com/ > -- Thanks! -Shawn http://www.spidean.com -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php