Re: Disk IO performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sunday 28 November 2010,

Larry Garfield <larry@xxxxxxxxxxxxxxxx> wrote:

> There are many things that everybody "knows" about optimizing PHP code.

> One of them is that one of the most expensive parts of the process is

> loading code off of disk and compiling it, which is why opcode caches

> are such a bit performance boost. The corollary to that, of course, is

> that more files = more IO and therefore more of a performance hit.

It depends on the implementation that PHP uses to open the file. For

example on Linux and similar operating systems, PHP uses the mmap(2)

function instead of read(2) or fread(2) functions, so it maps the complete

file into memory, that is more faster than using partial file reads.

>

> But... this is 'effin 2010. It's almost bloody 2011. Operating systems

> are smart. They already have 14 levels of caching built into them from

> hard drive micro-controller to RAM to CPU cache to OS. I've heard from

> other people (who should know) that the IO cost of doing a file_exists()

> or other stat calls is almost non-existent because a modern OS caches

> that, and with OS-based file caching even reading small files off disk

> (the size that most PHP source files are) is not as slow as we think.

Yes, that's true. This point depends on how the operating system has

implemented it's VFS. Linux, FreeBSD and other platforms, have a well

done implementation of VFSs, so they have a good cache implementation

for concurrent reads, and first read on a file is made /hard/, then it

uses the cached file location (inode data).

>

> Personally, I don't know. I am not an OS engineer and haven't

> benchmarked such things, nor am I really competent to do so. However,

> it makes a huge impact on the way one structures a large PHP program as

> the performance trade- offs of huge files with massive unused code (that

> has to be compiled) vs the cost of reading lots of separate files from

> disk (more IO) is highly dependent on the speed of the aforementioned IO

> and of compilation.

You can do your own benchmarks tests, from high level perspective or

low level perspective. If you want to trace performance on how PHP reads

files from the hard drive, you can use some extensions like xdebug.

For example if you prefer to use require() and include(), instead of

require_once() and include_once() for concurrent reads, probably you

will get a lower performance because you will do real concurrent reads

on certain files.

>

> So... does anyone have any actual, hard data here? I don't mean "I

> think" or "in my experience". I am looking for hard benchmarks,

> profiling, or writeups of how OS (Linux specifically if it matters) file

> caching works in 2010, not in 1998.

Well, it also depends on the operating system configuration. If you

just want to know the performance of IO functions on PHP, I suggest to

use an extension like xdebug. It can generate profiling information to

be used with kcachegrind, so you can properly visualize how are working

IO functions in php.

Probably a mmap(2) extension for PHP would be useful for certain kind

of files. The file_get_contents() function uses open(2)/read(2), so you can't do a quick on a file.

>

> Modernizing what "everyone knows" is important for the general community,

> and the quality of our code.

>

> --Larry Garfield

Best regards,

--

Daniel Molina Wegener <dmw [at] coder [dot] cl>

System Programmer & Web Developer

Phone: +56 (2) 979-0277 | Blog: http://coder.cl/

Attachment: signature.asc
Description: This is a digitally signed message part.


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux