Re: PHP console script vs C/C++/C#

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 18, 2008 at 10:53 AM, Daniel Kolbo <kolb0057@xxxxxxx> wrote:
>
>
>
>  Struan Donald wrote:
>
> > * at 17/04 16:30 -0500 Daniel Kolbo said:
> >
> >
> > > Hello,
> > >
> > > I am writing a PHP script for a local application (not web/html
> > > based). My script is taking a longer time to execute than I want. The
> > > source code is a few thousand lines, so I will spare you all this
> > > level of detail.
> > >
> > > I prefer to write in PHP because that is what I know best.  However, I
> do not think there is anything inherent with my script that requires PHP
> over C/C++/C#.
> > >
> > >
> >
> > I think this points to an answer. If you're not too familiar with one
> > of the compiled languages then producing code that runs faster than
> > your current PHP implementation is a tall order. PHP, like most
> > scripting languages, is compiled into an internal format as a first
> > step and it's this that's then run. A lot of effort has gone into
> > making this pretty fast and by deciding to rewrite in a compiled
> > language you are betting that the C code, or whatever, you write will
> > be faster. Given the effort I imagine translating a few thousand
> > lines of PHP into one of the languages you name is likely to be
> > significant you'd want to be sure of winning that bet.
> >
> >
> >
> > > If I wrote the console application in a c language (and compiled) would
> one expect to see any improvements in performance?  If so, how much
> improvement could one expect (in general)?
> > >
> > >
> >
> > How long will it take you to convert the program? How much more time
> > will you spend on support and bugfixing?
> >
> >
> >
> > > I assume because php is not compiled that this real time interpretation
> of the script by the zend engine must take some time.  This is why I am
> thinking about rewriting my whole script in a C language.  But before I
> begin that ordeal, i wanted to ask the community for their opinions.  If you
> think using a c language would suit me well, what language would you
> recommend?
> > >
> > >
> >
> > It's not real time interpretation. It's a one time parse and compile
> > when the script starts and then it runs the internal bytecode. If you
> > have APC, or some other sort of caching mechanism installed then part
> > of the speed up comes from caching the bytecode and saving on the
> > initial parse and compile phase.
> >
> > As to what language then if you want to go ahead and do this you
> > should pick the one you know best. If you don't know any of them that
> > well then I really think that your time would be better spent on
> > optimising the existing PHP code first. Are you sure it's running as
> > fast as it can? Do you know where it's slow?
> >
> > Rewriting it in another language really is the 50 pound lump hammer
> > solution to the problem if you've not tried anything else to speed it
> > up.
> >
> >
> >
> > > My google and mail archive searching for this yielded mainly PHP for web
> apps, so I am asking all of you.
> > >
> > > My main question is, how much of an improvement in performance will one
> see by using a compiled version of an application versus using a scripted
> version of an application?
> > >
> > > I looked at PHP's bcompiler, but the documentation is minimal so I am
> hesitant to dig much deeper into that, unless someone strongly suggests
> otherwise.
> > >
> > >
> >
> > A quick look at the docs tells me that all that bcompiler will do is
> > save you the initial parse and compile phase and not speed up the
> > execution of the code.
> >
> > The one thing you don't say is exactly how long is too long? Is it
> > hours or minutes? If it's seconds then consider, as someone has
> > suggested elsewhere in the thread, looking at APC as that should cut
> > down the start up time of the script.
> >
> > HTH and apologies if none of this is news to you.
> >
> > Struan
> >
> >
> >
>
>  You are correct in that I want to be pretty sure I win the bet, before
> translating all the code.  The code really isn't that complicated, so I
> think I am capable of translating it.  Just a bunch of pretty small
> functions.  The code is a simulation-model type of program, so the
> bottleneck in the code is the "looping".  I know the exact function that
> takes up 86% of the time.  I have tried to rewrite this function from
> different approaches.  The bottom line is "work" needs to be done, and a lot
> of it.  I really can't think of any other ways to improve the "logic" of the
> code at this time.  Perhaps there are different methods I could be using to
> speed up execution.  Again, I think the source of the issue is looping.
>
>  Here is the function that takes 86% of the time...This function is called
> 500,000,000 times with different parameters ($arr and $arr2) (which I cannot
> predict their values just their size).
>
>  ========= C O D E ===S T AR T ========
>  //this function is essentially a search and remove function for a nested
> array
>
>    foreach ($arr as $key => $value) {
>        //count($arr) == 3
>        foreach ($value as $key2 => $value2) {
>          //0<=count($value) <=1000
>            foreach($arr2 as $value3) {
>                //count($arr2) == 2
>                if (in_array($value3, $value2)) {
>                    unset($arr[$key][$key2]);
>                    break;
>                }
>            }
>        }
>
>  ========= C O D E ===E N D ========
>
>  So essentially 3 foreach nested, invoking in_array(), and unset().
>  I rewrote the above code by making $arr a 1 dimensional array and 'storing'
> the nested key values as a string index with delimiter, so that I could
> unset the original nested $arr by exploding this index...i'll just show the
> code.
>
>  ========= C O D E 2 ==== S T A R T=======
>  //first i prepare $arr
>
>  function CompressRay($some_nested_ray, $delimiter = "|") {
>    //not really compression just flattens the array
>    //returns an array of string of key_strings and the final value
>      $answer_ray = array();
>      foreach ($some_nested_ray as $key => $value) {
>        $key_string = (string)$key.$delimiter;
>        if (is_array($value)) {
>            $compressed_sub_ray = CompressRay($value, $delimiter);
>            //echo "Compressed Sub is \n";
>            //print_r($compressed_sub_ray);
>            foreach ($compressed_sub_ray as $sub_key_string => $final_value)
> {
>                $answer_ray[$key_string.$sub_key_string] = $final_value;
>            }
>        }else {
>            $answer_ray[substr($key_string,0,-1)] = $value;
>        }
>    }
>    return $answer_ray;
>  }
>
>  $arr['compressed'] = CompressRay($arr);
>  //this part happens quickly, no worries so far
>
>  //then i call the below procedure oh, about 500,000,000 times
>
>    foreach ($arr2 as $value3) {
>        $key_strings = array_keys($arr['compressed'], $value3);
>        foreach ($key_strings as $key_string) {
>             $key_sequence = explode("|",$key_string);
>             unset($all_vs_holes[$key_sequence[0]][$key_sequence[1]]);
>            $upto_hole = substr($key_string,0,-2);
>            unset($arr['compressed'][$upto_hole."|0"]);
>            //to keep the compressed archive accurate
>            unset($arr['compressed'][$upto_hole."|1"]);
>            //to keep the compressed archive accurate
>        }
>    }
>
>  ========= C O D E 2 ==== E N D=======
>
>  to my surprise code2 was actually slower, twice as slow.  I started
> thinking maybe by passing the relatively large $arr by value 500 million
> times was taking up a lot of time...but some bench mark testing I did,
> actually didn't show that much improvement (if any) by passing a large array
> by reference.  This seemed counterintuitive to me, and b/c i would have to
> make a duplicate copy of $arr if i did pass by reference it seemed like
> little gain would come of it.
>
>  Like I said, some work has to be done...these iterations have to be
> performed.
>  By long time, i am speaking about days.  I am not entirely convinced that
> by making minor optimization changes to the particular syntax or methods
> invoked will yield any order of magnitude difference.  The order of
> magnitude difference I need, (i think) must come from changing actual logic
> of the code - which is difficult to do in an almost simple iteration
> procedure.  An analogy, it doesn't matter if the code is lance armstrong or
> some big NFL lineman, they are running 100,000 back to back marathons and
> are going to get tired and start crawling either way.
>
>  This is why i feel i am up against a brick wall, and must start looking for
> a language that runs a bit faster.  Some preliminary looping of 1 billion
> iterations in C++ vs. PHP has yielded substantial difference...like 10^4
> magnitude difference in time.  This makes me feel like my bet is justified
> in translating the code.
>
>  I am going to miss php ;(
>
>  As I know the bottleneck is in the actual execution of the code, the APC
> and bcompiler won't offer much gain, thanks for the consideration and
> looking into those.
>
>  At this point some of you may encourage me to go to C++ so i stop with this
> question...but I'd like to hear if you all agree that perhaps it is time to
> pull out the 50 lbp lump hammer?
>
>  Thanks,
>  Dan K
>
>

Like I said before, since you know that most of your time is in a
specific part of your script, just move that function into a custom
extension written in c/c++.

http://talks.php.net/show/extending-php-apachecon2003/0

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux