Re: PHP console script vs C/C++/C#

Nick Stinemates <nick@xxxxxxxxxxxxxx> · Fri, 18 Apr 2008 09:26:21 -0700

>
> ========= C O D E ===S T AR T ========
> //this function is essentially a search and remove function for a nested 
> array
>
>    foreach ($arr as $key => $value) {
>        //count($arr) == 3
>        foreach ($value as $key2 => $value2) {
>          //0<=count($value) <=1000
>            foreach($arr2 as $value3) {
>                //count($arr2) == 2
>                if (in_array($value3, $value2)) {
>                    unset($arr[$key][$key2]);
>                    break;
>                }
>            }
>        }
>
> ========= C O D E ===E N D ========
>

I can see why you would like to refactor this code to C, but, the
problem comes from looping through 3 sets of arrays instead of using
small data models.

If you use arrays in C/++ you'll be no better off. This type of
operation is expensive on large numbers of data. You may have some great
compiler optimization in C, though.

How big of a dataset are we talking about in these 3 arrays?

> So essentially 3 foreach nested, invoking in_array(), and unset(). 
> I rewrote the above code by making $arr a 1 dimensional array and 'storing' 
> the nested key values as a string index with delimiter, so that I could 
> unset the original nested $arr by exploding this index...i'll just show the 
> code.
>

> ========= C O D E 2 ==== S T A R T=======
> //first i prepare $arr
>
> function CompressRay($some_nested_ray, $delimiter = "|") {
>    //not really compression just flattens the array
>    //returns an array of string of key_strings and the final value
>      $answer_ray = array();
>      foreach ($some_nested_ray as $key => $value) {
>        $key_string = (string)$key.$delimiter;
>        if (is_array($value)) {
>            $compressed_sub_ray = CompressRay($value, $delimiter);
>            //echo "Compressed Sub is \n";
>            //print_r($compressed_sub_ray);
>            foreach ($compressed_sub_ray as $sub_key_string => $final_value) 
> {
>                $answer_ray[$key_string.$sub_key_string] = $final_value;
>            }
>        }else {
>            $answer_ray[substr($key_string,0,-1)] = $value;
>        }
>    }
>    return $answer_ray;
> }
>
> $arr['compressed'] = CompressRay($arr);
> //this part happens quickly, no worries so far
>
> //then i call the below procedure oh, about 500,000,000 times
>
>    foreach ($arr2 as $value3) {
>        $key_strings = array_keys($arr['compressed'], $value3);
>        foreach ($key_strings as $key_string) {
>             $key_sequence = explode("|",$key_string);
>             unset($all_vs_holes[$key_sequence[0]][$key_sequence[1]]);
>            $upto_hole = substr($key_string,0,-2);
>            unset($arr['compressed'][$upto_hole."|0"]);
>            //to keep the compressed archive accurate
>            unset($arr['compressed'][$upto_hole."|1"]);
>            //to keep the compressed archive accurate
>        }
>    }
>
> ========= C O D E 2 ==== E N D=======
>
> to my surprise code2 was actually slower, twice as slow.  I started 
> thinking maybe by passing the relatively large $arr by value 500 million 
> times was taking up a lot of time...but some bench mark testing I did, 
> actually didn't show that much improvement (if any) by passing a large 
> array by reference.  This seemed counterintuitive to me, and b/c i would 
> have to make a duplicate copy of $arr if i did pass by reference it seemed 
> like little gain would come of it.

Unfortunately passing be reference is not the same as passing a pointer.
:(

>
> Like I said, some work has to be done...these iterations have to be 
> performed. 
> By long time, i am speaking about days.  I am not entirely convinced that 
> by making minor optimization changes to the particular syntax or methods 
> invoked will yield any order of magnitude difference.  The order of 
> magnitude difference I need, (i think) must come from changing actual logic 
> of the code - which is difficult to do in an almost simple iteration 
> procedure.  An analogy, it doesn't matter if the code is lance armstrong or 
> some big NFL lineman, they are running 100,000 back to back marathons and 
> are going to get tired and start crawling either way.
>
> This is why i feel i am up against a brick wall, and must start looking for 
> a language that runs a bit faster.  Some preliminary looping of 1 billion 
> iterations in C++ vs. PHP has yielded substantial difference...like 10^4 
> magnitude difference in time.  This makes me feel like my bet is justified 
> in translating the code.
>
> I am going to miss php ;(
>
> As I know the bottleneck is in the actual execution of the code, the APC 
> and bcompiler won't offer much gain, thanks for the consideration and 
> looking into those.
>
> At this point some of you may encourage me to go to C++ so i stop with this 
> question...but I'd like to hear if you all agree that perhaps it is time to 
> pull out the 50 lbp lump hammer?

Still disagree. I have a feeling that if you explained the domain a bit
more, what the input for these functions would be, we could come up with
a solution which would be sufficiently faster.

While you may see some moderate gains going to C, I think the basework
could still use some optimization.

>
> Thanks,
> Dan K
>

-- 
Nick Stinemates (nick@xxxxxxxxxxxxxx)
http://nick.stinemates.org

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php