Wow thanks for the helpful breakdown.
PHP's model is to be completely sandboxed such that every request is completely separate from every other. Having a persistent interpreter as you describe would break that rule and break the infinite horizontal scalability model of PHP.
Understood. A persistent interpreter is a whole different approach. I appreciate your perspective on that, it helps me to reconsider overall what I'm doing and why I want to do it :)
Of course, there is nothing that prevents you from storing persistent data somewhere more permanent. If it is just simple read-only data you have a lot of options. For example, you could put them in a .ini file that is only loaded on Apache startup and use get_cfg_var() to fetch them. If you compile PHP with the --with-config-file-scan-dir switch to configure a configuration scan directory you can just drop your own ini file in that directory and it will be read on startup. This is just key=value pairs and not PHP code, of course.
I'm dealing with large multidimensional arrays, like this:
$categories = array (
1 => array ( 'name' => 'Autos',
'depth' => 1,
...
:(
If you need to do something fancier you can stick things in shared memory. Many of the accelerators give you access to their shared memory segments. For example, the CVS version of pecl/apc provides apc_store() and apc_fetch() which lets you store PHP datatypes in shared memory directly without needing to serialize/unserialize them.
That pecl/apc feature sounds like a great, cheap solution to my giant global variable definition problem, which takes the biggest single chunk of parsing time. The key AFAICS is avoiding the (un)serialization time. I'd love to see an example if you have one, just to show me a target to aim for. I'm unfamiliar with C / reading C source code and with shared memory so I'm having a tough time figuring out how to use that feature.
It's trivial. You check to see if your data is in shared memory. If it isn't, you create it and store it there using some identifier. Like this:
$huge_array = apc_fetch('my_huge_array'); if(!$huge_array) { $huge_array = array( ... ); apc_store('my_huge_array', $huge_array); }
You can optionally add a 3rd argument to apc_store() which is a timeout for the data. That means you can tell APC that the data is only valid for 600 seconds, for example.
apc_store('my_huge_array', $huge_array, 600);
And finally, the big hammer is to write your own PHP extension. This is a lot easier than people think and for people who are really looking for performance this is the only way to go. You can write whatever code you want in your MINIT hook which only gets called on server startup and in that hook you can create whatever persistent variables you need, pull stuff from a DB, etc. At the same time you will likely want to pull some of your heavier business logic identified by your profiling into the extension as well. This combination of C and PHP is extremely hard to beat performance-wise regardless of what you compare it to.
This is something I'm VERY interested in. It is encouraging to hear that it is easier than I expect, and I will look into it further. Based on the responses from the list I've gotten, this seems like the most promising "total" solution. Any outstanding books/articles on the topic, considering I'm not a C programmer?
If you don't know any C at all it is going to be an uphill battle. Either spend some time and learn C, or hire someone to write this for you. When I said it was easier than people expect, I was assuming some level of C proficiency.
-Rasmus
-- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php