Robert Cummings wrote:
On Sat, 2008-04-05 at 19:09 +0100, Steve McGill wrote:
"Richard Heyes" <richardh@xxxxxxxxxxx> wrote in message
news:47F75D2A.3020101@xxxxxxxxxxxxxx
Thanks for the heads up on fgetc() incrementing by one. I hadn't actually
tested that code yet, I was using the original fseek($handle,$pos).
strpos would be ideal but it needs to work on a string and not a file - I
don't want to load a 100Mb file into memory if I don't have to. Perhaps I
should test how quick the fgets() and ftell() method is because at least
it loads in one line at a time.
Does anybody know any other ways to go about the problem?
Haven't read the rest of the thread, and so going by the subject alone,
fgets() finishes when it encounters a newline, so you can use this
wondrous fact to seek to a specific line:
<?php
$fp = fopen('filename', 'r');
$num = 18; // Desired line number
for ($i=0; $i<$num; $i++)
$line = fgets($fp);
echo $line;
?>
It works because fgets() stops when it encounters a newline (\n). So it's
just a case of counting the calls to fgets().
fgets() would work but as I'm constantly jumping around a 500,000 line file
I thought it was better to maintain a cache of line number positions.
As a final update to anybody following:
- Taking away the unnecessary fseek() made the script execute in 63 seconds
- Using a buffer system, (reading in 1Mb of the text file at a time and then
looping through the string in memory) made the script execute in 36 seconds.
Huge improvement, but...
- Porting the code to C++, doing a shell_exec and reading the results back
in to PHP, took less than 2 seconds.
As fgetc() etc are all effectively C wrappers I was quite surprised at the
speed increase....
It really depends on how you write your code... I ran the following
script on a 150 meg text log file containing 1905883 lines in 4 seconds
(note that it performs caching). Here's the script:
<?php
$path = $argv[1];
if( ($fPtr = fopen( $path, 'r' )) === false )
{
echo "Couldn't open for reading: $path\n";
exit();
}
$line = 1;
$lines[$line] = 0;
while( fgets( $fPtr ) !== false )
{
$lines[++$line] = ftell( $fPtr );
}
couldn't you get away from incrementing a counter variable by simply
starting the array at index #1 ??
$lines[1] = 0;
while( fgets( $fPtr ) !== false )
{
$lines[] = ftell( $fPtr );
}
Wouldn't this make it faster?
fclose( $fPtr );
?>
Here's the run times on several iterations (Athlon 2400+):
real 0m4.065s
user 0m3.488s
sys 0m0.464s
real 0m4.005s
user 0m3.464s
sys 0m0.436s
real 0m5.816s
user 0m3.336s
sys 0m0.536s
real 0m3.994s
user 0m3.384s
sys 0m0.504s
real 0m4.069s
user 0m3.512s
sys 0m0.444s
real 0m4.009s
user 0m3.344s
sys 0m0.552s
Cheers,
Rob.
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php