Re: PHP console script vs C/C++/C#

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





Struan Donald wrote:
* at 17/04 16:30 -0500 Daniel Kolbo said:
Hello,

I am writing a PHP script for a local application (not web/html
based). My script is taking a longer time to execute than I want. The
source code is a few thousand lines, so I will spare you all this
level of detail.

I prefer to write in PHP because that is what I know best. However, I do not think there is anything inherent with my script that requires PHP over C/C++/C#.

I think this points to an answer. If you're not too familiar with one
of the compiled languages then producing code that runs faster than
your current PHP implementation is a tall order. PHP, like most
scripting languages, is compiled into an internal format as a first
step and it's this that's then run. A lot of effort has gone into
making this pretty fast and by deciding to rewrite in a compiled
language you are betting that the C code, or whatever, you write will
be faster. Given the effort I imagine translating a few thousand
lines of PHP into one of the languages you name is likely to be
significant you'd want to be sure of winning that bet.

If I wrote the console application in a c language (and compiled) would one expect to see any improvements in performance? If so, how much improvement could one expect (in general)?

How long will it take you to convert the program? How much more time
will you spend on support and bugfixing?

I assume because php is not compiled that this real time interpretation of the script by the zend engine must take some time. This is why I am thinking about rewriting my whole script in a C language. But before I begin that ordeal, i wanted to ask the community for their opinions. If you think using a c language would suit me well, what language would you recommend?

It's not real time interpretation. It's a one time parse and compile
when the script starts and then it runs the internal bytecode. If you
have APC, or some other sort of caching mechanism installed then part
of the speed up comes from caching the bytecode and saving on the
initial parse and compile phase.

As to what language then if you want to go ahead and do this you
should pick the one you know best. If you don't know any of them that
well then I really think that your time would be better spent on
optimising the existing PHP code first. Are you sure it's running as
fast as it can? Do you know where it's slow?

Rewriting it in another language really is the 50 pound lump hammer
solution to the problem if you've not tried anything else to speed it
up.

My google and mail archive searching for this yielded mainly PHP for web apps, so I am asking all of you.

My main question is, how much of an improvement in performance will one see by using a compiled version of an application versus using a scripted version of an application?

I looked at PHP's bcompiler, but the documentation is minimal so I am hesitant to dig much deeper into that, unless someone strongly suggests otherwise.

A quick look at the docs tells me that all that bcompiler will do is
save you the initial parse and compile phase and not speed up the
execution of the code.

The one thing you don't say is exactly how long is too long? Is it
hours or minutes? If it's seconds then consider, as someone has
suggested elsewhere in the thread, looking at APC as that should cut
down the start up time of the script.

HTH and apologies if none of this is news to you.

Struan


You are correct in that I want to be pretty sure I win the bet, before translating all the code. The code really isn't that complicated, so I think I am capable of translating it. Just a bunch of pretty small functions. The code is a simulation-model type of program, so the bottleneck in the code is the "looping". I know the exact function that takes up 86% of the time. I have tried to rewrite this function from different approaches. The bottom line is "work" needs to be done, and a lot of it. I really can't think of any other ways to improve the "logic" of the code at this time. Perhaps there are different methods I could be using to speed up execution. Again, I think the source of the issue is looping.

Here is the function that takes 86% of the time...This function is called 500,000,000 times with different parameters ($arr and $arr2) (which I cannot predict their values just their size).

========= C O D E ===S T AR T ========
//this function is essentially a search and remove function for a nested array

   foreach ($arr as $key => $value) {
       //count($arr) == 3
       foreach ($value as $key2 => $value2) {
         //0<=count($value) <=1000
           foreach($arr2 as $value3) {
               //count($arr2) == 2
               if (in_array($value3, $value2)) {
                   unset($arr[$key][$key2]);
                   break;
               }
           }
       }

========= C O D E ===E N D ========

So essentially 3 foreach nested, invoking in_array(), and unset(). I rewrote the above code by making $arr a 1 dimensional array and 'storing' the nested key values as a string index with delimiter, so that I could unset the original nested $arr by exploding this index...i'll just show the code.

========= C O D E 2 ==== S T A R T=======
//first i prepare $arr

function CompressRay($some_nested_ray, $delimiter = "|") {
   //not really compression just flattens the array
   //returns an array of string of key_strings and the final value
$answer_ray = array(); foreach ($some_nested_ray as $key => $value) {
       $key_string = (string)$key.$delimiter;
       if (is_array($value)) {
           $compressed_sub_ray = CompressRay($value, $delimiter);
           //echo "Compressed Sub is \n";
           //print_r($compressed_sub_ray);
foreach ($compressed_sub_ray as $sub_key_string => $final_value) {
               $answer_ray[$key_string.$sub_key_string] = $final_value;
           }
       }else {
           $answer_ray[substr($key_string,0,-1)] = $value;
       }
   }
   return $answer_ray;
}

$arr['compressed'] = CompressRay($arr);
//this part happens quickly, no worries so far

//then i call the below procedure oh, about 500,000,000 times

   foreach ($arr2 as $value3) {
       $key_strings = array_keys($arr['compressed'], $value3);
       foreach ($key_strings as $key_string) {
            $key_sequence = explode("|",$key_string);
            unset($all_vs_holes[$key_sequence[0]][$key_sequence[1]]);
           $upto_hole = substr($key_string,0,-2);
           unset($arr['compressed'][$upto_hole."|0"]);
           //to keep the compressed archive accurate
           unset($arr['compressed'][$upto_hole."|1"]);
           //to keep the compressed archive accurate
       }
   }

========= C O D E 2 ==== E N D=======

to my surprise code2 was actually slower, twice as slow. I started thinking maybe by passing the relatively large $arr by value 500 million times was taking up a lot of time...but some bench mark testing I did, actually didn't show that much improvement (if any) by passing a large array by reference. This seemed counterintuitive to me, and b/c i would have to make a duplicate copy of $arr if i did pass by reference it seemed like little gain would come of it.

Like I said, some work has to be done...these iterations have to be performed. By long time, i am speaking about days. I am not entirely convinced that by making minor optimization changes to the particular syntax or methods invoked will yield any order of magnitude difference. The order of magnitude difference I need, (i think) must come from changing actual logic of the code - which is difficult to do in an almost simple iteration procedure. An analogy, it doesn't matter if the code is lance armstrong or some big NFL lineman, they are running 100,000 back to back marathons and are going to get tired and start crawling either way.

This is why i feel i am up against a brick wall, and must start looking for a language that runs a bit faster. Some preliminary looping of 1 billion iterations in C++ vs. PHP has yielded substantial difference...like 10^4 magnitude difference in time. This makes me feel like my bet is justified in translating the code.

I am going to miss php ;(

As I know the bottleneck is in the actual execution of the code, the APC and bcompiler won't offer much gain, thanks for the consideration and looking into those.

At this point some of you may encourage me to go to C++ so i stop with this question...but I'd like to hear if you all agree that perhaps it is time to pull out the 50 lbp lump hammer?

Thanks,
Dan K


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux