Re: How to jump to line number in large file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



So the only variation on a theme that I didn't test is the one that performs 
the best by an order of magnitude... nice. Many thanks for your time 
everyone.

"Robert Cummings" <robert@xxxxxxxxxxxxx> wrote in message 
news:1207423078.6774.114.camel@xxxxxxxxxx
>
> On Sat, 2008-04-05 at 19:09 +0100, Steve McGill wrote:
>> "Richard Heyes" <richardh@xxxxxxxxxxx> wrote in message
>> news:47F75D2A.3020101@xxxxxxxxxxxxxx
>> >> Thanks for the heads up on fgetc() incrementing by one. I hadn't 
>> >> actually
>> >> tested that code yet, I was using the original fseek($handle,$pos).
>> >>
>> >> strpos would be ideal but it needs to work on a string and not a 
>> >> file - I
>> >> don't want to load a 100Mb file into memory if I don't have to. 
>> >> Perhaps I
>> >> should test how quick the fgets() and ftell() method is because at 
>> >> least
>> >> it loads in one line at a time.
>> >>
>> >> Does anybody know any other ways to go about the problem?
>> >
>> > Haven't read the rest of the thread, and so going by the subject alone,
>> > fgets() finishes when it encounters a newline, so you can use this
>> > wondrous fact to seek to a specific line:
>> >
>> > <?php
>> >     $fp  = fopen('filename', 'r');
>> >     $num = 18; // Desired line number
>> >
>> >     for ($i=0; $i<$num; $i++)
>> >         $line = fgets($fp);
>> >
>> >     echo $line;
>> > ?>
>> >
>> > It works because fgets() stops when it encounters a newline (\n). So 
>> > it's
>> > just a case of counting the calls to fgets().
>>
>> fgets() would work but as I'm constantly jumping around a 500,000 line 
>> file
>> I thought it was better to maintain a cache of line number positions.
>>
>> As a final update to anybody following:
>>
>> - Taking away the unnecessary fseek() made the script execute in 63 
>> seconds
>> - Using a buffer system, (reading in 1Mb of the text file at a time and 
>> then
>> looping through the string in memory) made the script execute in 36 
>> seconds.
>> Huge improvement, but...
>> - Porting the code to C++, doing a shell_exec and reading the results 
>> back
>> in to PHP, took less than 2 seconds.
>>
>> As fgetc() etc are all effectively C wrappers I was quite surprised at 
>> the
>> speed increase....
>
> It really depends on how you write your code... I ran the following
> script on a 150 meg text log file containing 1905883 lines in 4 seconds
> (note that it performs caching). Here's the script:
>
> <?php
>
> $path = $argv[1];
>
> if( ($fPtr = fopen( $path, 'r' )) === false )
> {
>    echo "Couldn't open for reading: $path\n";
>    exit();
> }
>
> $line = 1;
> $lines[$line] = 0;
>
> while( fgets( $fPtr ) !== false )
> {
>    $lines[++$line] = ftell( $fPtr );
> }
>
> fclose( $fPtr );
>
> ?>
>
> Here's the run times on several iterations (Athlon 2400+):
>
> real    0m4.065s
> user    0m3.488s
> sys     0m0.464s
>
> real    0m4.005s
> user    0m3.464s
> sys     0m0.436s
>
> real    0m5.816s
> user    0m3.336s
> sys     0m0.536s
>
> real    0m3.994s
> user    0m3.384s
> sys     0m0.504s
>
> real    0m4.069s
> user    0m3.512s
> sys     0m0.444s
>
> real    0m4.009s
> user    0m3.344s
> sys     0m0.552s
>
> Cheers,
> Rob.
> -- 
> http://www.interjinn.com
> Application and Templating Framework for PHP
> 



-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux