Struan Donald wrote:
* at 17/04 16:30 -0500 Daniel Kolbo said:
Hello,
I am writing a PHP script for a local application (not web/html
based). My script is taking a longer time to execute than I want. The
source code is a few thousand lines, so I will spare you all this
level of detail.
I prefer to write in PHP because that is what I know best. However, I
do not think there is anything inherent with my script that requires PHP
over C/C++/C#.
I think this points to an answer. If you're not too familiar with one
of the compiled languages then producing code that runs faster than
your current PHP implementation is a tall order. PHP, like most
scripting languages, is compiled into an internal format as a first
step and it's this that's then run. A lot of effort has gone into
making this pretty fast and by deciding to rewrite in a compiled
language you are betting that the C code, or whatever, you write will
be faster. Given the effort I imagine translating a few thousand
lines of PHP into one of the languages you name is likely to be
significant you'd want to be sure of winning that bet.
If I wrote the console application in a c language (and compiled) would
one expect to see any improvements in performance? If so, how much
improvement could one expect (in general)?
How long will it take you to convert the program? How much more time
will you spend on support and bugfixing?
I assume because php is not compiled that this real time interpretation
of the script by the zend engine must take some time. This is why I am
thinking about rewriting my whole script in a C language. But before I
begin that ordeal, i wanted to ask the community for their opinions. If
you think using a c language would suit me well, what language would you
recommend?
It's not real time interpretation. It's a one time parse and compile
when the script starts and then it runs the internal bytecode. If you
have APC, or some other sort of caching mechanism installed then part
of the speed up comes from caching the bytecode and saving on the
initial parse and compile phase.
As to what language then if you want to go ahead and do this you
should pick the one you know best. If you don't know any of them that
well then I really think that your time would be better spent on
optimising the existing PHP code first. Are you sure it's running as
fast as it can? Do you know where it's slow?
Rewriting it in another language really is the 50 pound lump hammer
solution to the problem if you've not tried anything else to speed it
up.
My google and mail archive searching for this yielded mainly PHP for web
apps, so I am asking all of you.
My main question is, how much of an improvement in performance will one
see by using a compiled version of an application versus using a
scripted version of an application?
I looked at PHP's bcompiler, but the documentation is minimal so I am
hesitant to dig much deeper into that, unless someone strongly suggests
otherwise.
A quick look at the docs tells me that all that bcompiler will do is
save you the initial parse and compile phase and not speed up the
execution of the code.
The one thing you don't say is exactly how long is too long? Is it
hours or minutes? If it's seconds then consider, as someone has
suggested elsewhere in the thread, looking at APC as that should cut
down the start up time of the script.
HTH and apologies if none of this is news to you.
Struan
You are correct in that I want to be pretty sure I win the bet, before
translating all the code. The code really isn't that complicated, so I
think I am capable of translating it. Just a bunch of pretty small
functions. The code is a simulation-model type of program, so the
bottleneck in the code is the "looping". I know the exact function that
takes up 86% of the time. I have tried to rewrite this function from
different approaches. The bottom line is "work" needs to be done, and a
lot of it. I really can't think of any other ways to improve the
"logic" of the code at this time. Perhaps there are different methods I
could be using to speed up execution. Again, I think the source of the
issue is looping.
Here is the function that takes 86% of the time...This function is
called 500,000,000 times with different parameters ($arr and $arr2)
(which I cannot predict their values just their size).
========= C O D E ===S T AR T ========
//this function is essentially a search and remove function for a nested
array
foreach ($arr as $key => $value) {
//count($arr) == 3
foreach ($value as $key2 => $value2) {
//0<=count($value) <=1000
foreach($arr2 as $value3) {
//count($arr2) == 2
if (in_array($value3, $value2)) {
unset($arr[$key][$key2]);
break;
}
}
}
========= C O D E ===E N D ========
So essentially 3 foreach nested, invoking in_array(), and unset().
I rewrote the above code by making $arr a 1 dimensional array and
'storing' the nested key values as a string index with delimiter, so
that I could unset the original nested $arr by exploding this
index...i'll just show the code.
========= C O D E 2 ==== S T A R T=======
//first i prepare $arr
function CompressRay($some_nested_ray, $delimiter = "|") {
//not really compression just flattens the array
//returns an array of string of key_strings and the final value
$answer_ray = array();
foreach ($some_nested_ray as $key => $value) {
$key_string = (string)$key.$delimiter;
if (is_array($value)) {
$compressed_sub_ray = CompressRay($value, $delimiter);
//echo "Compressed Sub is \n";
//print_r($compressed_sub_ray);
foreach ($compressed_sub_ray as $sub_key_string =>
$final_value) {
$answer_ray[$key_string.$sub_key_string] = $final_value;
}
}else {
$answer_ray[substr($key_string,0,-1)] = $value;
}
}
return $answer_ray;
}
$arr['compressed'] = CompressRay($arr);
//this part happens quickly, no worries so far
//then i call the below procedure oh, about 500,000,000 times
foreach ($arr2 as $value3) {
$key_strings = array_keys($arr['compressed'], $value3);
foreach ($key_strings as $key_string) {
$key_sequence = explode("|",$key_string);
unset($all_vs_holes[$key_sequence[0]][$key_sequence[1]]);
$upto_hole = substr($key_string,0,-2);
unset($arr['compressed'][$upto_hole."|0"]);
//to keep the compressed archive accurate
unset($arr['compressed'][$upto_hole."|1"]);
//to keep the compressed archive accurate
}
}
========= C O D E 2 ==== E N D=======
to my surprise code2 was actually slower, twice as slow. I started
thinking maybe by passing the relatively large $arr by value 500 million
times was taking up a lot of time...but some bench mark testing I did,
actually didn't show that much improvement (if any) by passing a large
array by reference. This seemed counterintuitive to me, and b/c i would
have to make a duplicate copy of $arr if i did pass by reference it
seemed like little gain would come of it.
Like I said, some work has to be done...these iterations have to be
performed.
By long time, i am speaking about days. I am not entirely convinced
that by making minor optimization changes to the particular syntax or
methods invoked will yield any order of magnitude difference. The order
of magnitude difference I need, (i think) must come from changing actual
logic of the code - which is difficult to do in an almost simple
iteration procedure. An analogy, it doesn't matter if the code is lance
armstrong or some big NFL lineman, they are running 100,000 back to back
marathons and are going to get tired and start crawling either way.
This is why i feel i am up against a brick wall, and must start looking
for a language that runs a bit faster. Some preliminary looping of 1
billion iterations in C++ vs. PHP has yielded substantial
difference...like 10^4 magnitude difference in time. This makes me feel
like my bet is justified in translating the code.
I am going to miss php ;(
As I know the bottleneck is in the actual execution of the code, the APC
and bcompiler won't offer much gain, thanks for the consideration and
looking into those.
At this point some of you may encourage me to go to C++ so i stop with
this question...but I'd like to hear if you all agree that perhaps it is
time to pull out the 50 lbp lump hammer?
Thanks,
Dan K