Re: File Write Operation Slows to a Crawl....

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



fschnittke@xxxxxxxxxxxxx wrote:
> Hi:
> 
> Newbie here. This is my first attempt at PHP scripting. I'm trying to find
> an alternative to Lotus Domino's domlog.nsf for logging web transactions.
> Domino does create an Apache compatible text file of the web transactions,
> and this is what I’m trying to parse. I started off using a code snibbet I
> found on the web. I modified it a little bit to suit my needs. It was
> working fine with the small 600k test log file I was using, but since I’ve
> moved to the larger 18Mb production log file here’s what happens:
> 
> I’ve modified the code and added an echo statement to echo each loop that
> gets processed. Initially it starts off very fast but then performance
> becomes very slow, to a point where I can count each loop as it’s being
> processed. It’s taking a little over 3 hours to parse the entire file. I
> figured it was a disk cache thing, so I created a ram drive. This has
> improved the performance, but is still taking an hour to parse.
> 
> Here is the PHP script I’m using:
> 
> 
> <?php
> 
Why read in an array and then implode it to a string, then split it into
an array?  Just use file_get_contents() and split it or use file() and
then do your preg_replace("/(\r|\t)/", on the array).
> $ac_arr = file('access_log');
> $astring = join("", $ac_arr);
> $astring = preg_replace("/(\r|\t)/", "", $astring);
> $records = preg_split("/(\n)/", $astring, -1, PREG_SPLIT_NO_EMPTY);
> 
> $sizerecs = sizeof($records);
> 
> // now split into records
> $i = 1;
> $each_rec = 0;
> 
Why not foreach($records as $all) ?
> while($i<$sizerecs) {
> $all = $records[$i];
> 
All of these $all = str_replace() and othe str_replace() are probably
killing you.  Rethink a way where you extract the data instead of
finding it and then replacing it.
> // IP Address ($IP):
> $IP = substr($all, 0, strpos($all, " "));
> $all = str_replace($IP, "", $all);
> 
> //Remote User ($RU):
> $string = substr($all, 0, strpos($all, " [")); // www.vpcl.on.ca T123
> $sstring = substr($string, strpos($string, " ")+1);
> $AUstring = substr($sstring, strpos($sstring, " "));
> $RU = preg_replace("/\"/", "", $AUstring);
> $RU = trim($RU);
> $all = str_replace($string, "", $all);
> 
> //Request Time Stamp ($RTS):
> preg_match("/\[(.+)\]/", $all, $match);
> $RTS = $match[1];
> $all = str_replace(" [$RTS] \"", "", $all);
> 
> //Http Request Line ($HRL):
> $string = substr($all, 0, strpos($all, "\"")+2);
> $HRL = str_replace("\"", "", $string);
> $all = str_replace($string, "", $all);
> 
> //Http Response Status Code (HRSC):
> $HRSC = trim(substr($all, 0, strpos($all, " ")+1));
> $all = str_replace($HRSC, "", $all);
> 
> //Request Content Length (RCL):
> $string = substr($all, 0, strpos($all, "\"")+1);
> $RCL = trim(str_replace("\"", "", $string));
> $all = str_replace($string, "", $all);
> 
> //Referring URL (RefU):
> $string = substr($all, 0, strpos($all, "\"")+3);
> $RefU = substr($all, 0, strpos($all, "\""));
> $all = str_replace($string, "", $all);
> 
> //User Agent (UA):
> $string = substr($all, 0, strpos($all, "\"")+2);
> $UA = substr($all, 0, strpos($all, "\""));
> $all = str_replace($string, "", $all);
> 
> //Time to Process Request:
> 
> #$new_format[$each_rec] = "$UA\n";
> $new_format[$each_rec] =
> "$IP\t$RU\t$RTS\t$HRL\t$HRSC\t$RCL\t$RefU\t$UA\t$all\n";
> 
Each time through the above loop you add a $new_format[$each_rec] and
then here you are looping through each one of those.  I think if you
just move this to the end it will make a drastic improvement.
> $fhandle = fopen("/ramdrive/import_file.txt", "w");
>   foreach($new_format as $data) {
>     fputs($fhandle, "$data");
>     }
>   fclose($fhandle);
> 
> // advance to next record
> echo "$i\n";
> $i = $i + 1;
> 
> $each_rec++;
> }
$fhandle = fopen("/ramdrive/import_file.txt", "w");
   foreach($new_format as $data) {
     fputs($fhandle, "$data");
   }
fclose($fhandle);

> ?>
> 
> 
> This is running on a Toshiba Tecra A4 Laptop with FreeBSD 7.0 Release.
> Plenty of RAM and HDD space. The PHP Version is:
> 
> PHP 5.2.5 with Suhosin-Patch 0.9.6.2 (cli) (built: Feb 11 2009 09:28:47)
> Copyright (c) 1997-2007 The PHP Group
> Zend Engine v2.2.0, Copyright (c) 1998-2007 Zend Technologies
> 
> What should I do to get this script to run faster?
> 
> Any help is appreciated….
> 
> Regards,
> 
> 
> 
> Fred Schnittke
> 
> 
> ----------------------------
> Powered by Execulink Webmail
> http://www.execulink.com/
> 


-- 
Thanks!
-Shawn
http://www.spidean.com

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux