Fwrite() vs file_put_contents()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have been pondering whether it would be feasible to work with a 100,000 entry index
file, and had put yesterday aside to do some timing tests. I first generated some sample
index files of various lengths. Each entry consisted of a single line with the form

ASDF;rhubarb, rhubarb, ....

where the ASDF is a randomly generated four character index, and the rest of the line is
filling, which varies slightly in length and contents from line to line, just in case
something tried to get smart and cache the line.  The average lenth of the line is about
80 bytes.

Then I wrote another program which read the file into an array, using the four character
index as the key, and the filling as the contents, sorted the array, and then rewrote it
to another file, reporting the elapsed time after each step.

My first version used fgets() to read the source file a line at a time, and fwrite() to
write the new file. This version performed quite consistently, and took approximately 1.3
seconds to read in a 100,000 entry 7.86Mb file, and another 5 seconds to write it out
again.

I then read the discussion following fschnittke's post "File write operation slows to a
crawl ... " and wondered if the suggestions made there would help.

First I used file() to read the entire file into memory, then processed each line into the
form required to set up my matrix. This gave a useful improvement for small files, halving
the time required to read and process a 10,000 entry 815 kB file, but for a 30,000 entry
file it had dropped to about 15%, and it made little difference for a 300,000 entry file.

Then I tried writing my whole array into a single horrendous string, and using
file_put_contents() to write out the whole string in one bang. I started testing on a
short file, and thought I was onto a good thing, as it halved the time to write out a
10,000 entry 800 K file. But as I increased the file size it began to fail dismally. With
a 30,000 entry file it was 20% slower, and at 100,000 entries it was three times slower.

On Shawn McKenzie's suggestion, I also tried replacing fgets() with stream_get_line(). As
I had anticipated any difference was well within below the timing noise level.

In conclusion, for short (1MB!) files, using file() to read the whole file into memory is
substantially better than using fgets() to read the file a line at a time, but the
advantage rapidly diminishes for longer files. Similarly  using file_put_contents() in
place of fwrite() to write it out again is better for short files (up to perhaps 1 MB) but
the performance deteriorates rapidly above this.


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux