Re: How to jump to line number in large file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Robert Cummings wrote:
On Sat, 2008-04-05 at 19:09 +0100, Steve McGill wrote:
"Richard Heyes" <richardh@xxxxxxxxxxx> wrote in message news:47F75D2A.3020101@xxxxxxxxxxxxxx
Thanks for the heads up on fgetc() incrementing by one. I hadn't actually tested that code yet, I was using the original fseek($handle,$pos).

strpos would be ideal but it needs to work on a string and not a file - I don't want to load a 100Mb file into memory if I don't have to. Perhaps I should test how quick the fgets() and ftell() method is because at least it loads in one line at a time.

Does anybody know any other ways to go about the problem?
Haven't read the rest of the thread, and so going by the subject alone, fgets() finishes when it encounters a newline, so you can use this wondrous fact to seek to a specific line:

<?php
    $fp  = fopen('filename', 'r');
    $num = 18; // Desired line number

    for ($i=0; $i<$num; $i++)
        $line = fgets($fp);

    echo $line;
?>

It works because fgets() stops when it encounters a newline (\n). So it's just a case of counting the calls to fgets().
fgets() would work but as I'm constantly jumping around a 500,000 line file I thought it was better to maintain a cache of line number positions.

As a final update to anybody following:

- Taking away the unnecessary fseek() made the script execute in 63 seconds
- Using a buffer system, (reading in 1Mb of the text file at a time and then looping through the string in memory) made the script execute in 36 seconds. Huge improvement, but... - Porting the code to C++, doing a shell_exec and reading the results back in to PHP, took less than 2 seconds.

As fgetc() etc are all effectively C wrappers I was quite surprised at the speed increase....

It really depends on how you write your code... I ran the following
script on a 150 meg text log file containing 1905883 lines in 4 seconds
(note that it performs caching). Here's the script:

<?php

$path = $argv[1];

if( ($fPtr = fopen( $path, 'r' )) === false )
{
    echo "Couldn't open for reading: $path\n";
    exit();
}

$line = 1;
$lines[$line] = 0;

while( fgets( $fPtr ) !== false )
{
    $lines[++$line] = ftell( $fPtr );
}

couldn't you get away from incrementing a counter variable by simply starting the array at index #1 ??

$lines[1] = 0;

while( fgets( $fPtr ) !== false )
{
    $lines[] = ftell( $fPtr );
}

Wouldn't this make it faster?


fclose( $fPtr );

?>

Here's the run times on several iterations (Athlon 2400+):

real    0m4.065s
user    0m3.488s
sys     0m0.464s

real    0m4.005s
user    0m3.464s
sys     0m0.436s

real    0m5.816s
user    0m3.336s
sys     0m0.536s

real    0m3.994s
user    0m3.384s
sys     0m0.504s

real    0m4.069s
user    0m3.512s
sys     0m0.444s

real    0m4.009s
user    0m3.344s
sys     0m0.552s

Cheers,
Rob.


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux