Re: Disk IO performance

Daniel Molina Wegener <dmw@xxxxxxxx> · Mon, 29 Nov 2010 06:36:42 -0300

On Sunday 28 November 2010,
Larry Garfield <larry@xxxxxxxxxxxxxxxx> wrote:

> There are many things that everybody "knows" about optimizing PHP code. 
> One of them is that one of the most expensive parts of the process is
> loading code off of disk and compiling it, which is why opcode caches
> are such a bit performance boost.  The corollary to that, of course, is
> that more files = more IO and therefore more of a performance hit.

  It depends on the implementation that PHP uses to open the file. For
example on Linux and similar operating systems, PHP uses the mmap(2)
function instead of read(2) or fread(2) functions, so it maps the complete
file into memory, that is more faster than using partial file reads.

> 
> But... this is 'effin 2010.  It's almost bloody 2011.  Operating systems
> are smart.  They already have 14 levels of caching built into them from
> hard drive micro-controller to RAM to CPU cache to OS.  I've heard from
> other people (who should know) that the IO cost of doing a file_exists()
> or other stat calls is almost non-existent because a modern OS caches
> that, and with OS-based file caching even reading small files off disk
> (the size that most PHP source files are) is not as slow as we think.

  Yes, that's true. This point depends on how the operating system has
implemented it's VFS. Linux, FreeBSD and other platforms, have a well
done implementation of VFSs, so they have a good cache implementation
for concurrent reads, and first read on a file is made /hard/, then it
uses the cached file location (inode data).

> 
> Personally, I don't know.  I am not an OS engineer and haven't
> benchmarked such things, nor am I really competent to do so.  However,
> it makes a huge impact on the way one structures a large PHP program as
> the performance trade- offs of huge files with massive unused code (that
> has to be compiled) vs the cost of reading lots of separate files from
> disk (more IO) is highly dependent on the speed of the aforementioned IO
> and of compilation.

  You can do your own benchmarks tests, from high level perspective or
low level perspective. If you want to trace performance on how PHP reads
files from the hard drive, you can use some extensions like xdebug.

  For example if you prefer to use require() and include(), instead of
require_once() and include_once() for concurrent reads, probably you
will get a lower performance because you will do real concurrent reads
on certain files.

> 
> So... does anyone have any actual, hard data here?  I don't mean "I
> think" or "in my experience".  I am looking for hard benchmarks,
> profiling, or writeups of how OS (Linux specifically if it matters) file
> caching works in 2010, not in 1998.

  Well, it also depends on the operating system configuration. If you
just want to know the performance of IO functions on PHP, I suggest to
use an extension like xdebug. It can generate profiling information to
be used with kcachegrind, so you can properly visualize how are working
IO functions in php.

  Probably a mmap(2) extension for PHP would be useful for certain kind
of files. The file_get_contents() function uses open(2)/read(2), so you can't do a quick on a file.

> 
> Modernizing what "everyone knows" is important for the general community,
> and the quality of our code.
> 
> --Larry Garfield

Best regards,
-- 
Daniel Molina Wegener <dmw [at] coder [dot] cl>
System Programmer & Web Developer
Phone: +56 (2) 979-0277 | Blog: http://coder.cl/
Attachment:
signature.asc

Description: This is a digitally signed message part.