Re: refernces, arrays, and why does it take up so much memory?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3 Sep 2013, at 02:30, Daevid Vincent <daevid@xxxxxxxxxx> wrote:

> I'm confused on how a reference works I think.
> 
> I have a DB result set in an array I'm looping over. All I simply want to do
> is make the array key the "id" of the result set row.
> 
> This is the basic gist of it:
> 
>       private function _normalize_result_set()
>       {
>              foreach($this->tmp_results as $k => $v)
>              {
>                     $id = $v['id'];
>                     $new_tmp_results[$id] =& $v; //2013-08-29 [dv] using a
> reference here cuts the memory usage in half!

You are assigning a reference to $v. In the next iteration of the loop, $v will be pointing at the next item in the array, as will the reference you're storing here. With this code I'd expect $new_tmp_results to be an array where the keys (i.e. the IDs) are correct, but the data in each item matches the data in the last item from the original array, which appears to be what you describe.

>                     unset($this->tmp_results[$k]);

Doing this for every loop is likely very inefficient. I don't know how the inner workings of PHP process something like this, but I wouldn't be surprised if it's allocating a new chunk of memory for a version of the array without this element. You may find it better to not unset anything until the loop has finished, at which point you can just unset($this->tmp_results).

> 
>                     /*
>                     if ($i++ % 1000 == 0)
>                     {
>                           gc_enable(); // Enable Garbage Collector
>                           var_dump(gc_enabled()); // true
>                           var_dump(gc_collect_cycles()); // # of elements
> cleaned up
>                           gc_disable(); // Disable Garbage Collector
>                     }
>                     */
>              }
>              $this->tmp_results = $new_tmp_results;
>              //var_dump($this->tmp_results); exit;
>              unset($new_tmp_results);
>       }


Try this:

private function _normalize_result_set()
{
  // Initialise the temporary variable.
  $new_tmp_results = array();

  // Loop around just the keys in the array.
  foreach (array_keys($this->tmp_results) as $k)
  {
    // Store the item in the temporary array with the ID as the key.
    // Note no pointless variable for the ID, and no use of &!
    $new_tmp_results[$this->tmp_results[$k]['id']] = $this->tmp_results[$k];
  }

  // Assign the temporary variable to the original variable.
  $this->tmp_results = $new_tmp_results;
}

I'd appreciate it if you could plug this in and see what your memory usage reports say. In most cases, trying to control the garbage collection through the use of references is the worst way to go about optimising your code. In my code above I'm relying on PHPs copy-on-write feature where data is only duplicated when assigned if it changes. No unsets, just using scope to mark a variable as able to be cleaned up.

Where is this result set coming from? You'd save yourself a lot of memory/time by putting the data in to this format when you read it from the source. For example, if reading it from MySQL, $this->tmp_results[$row['id']] = $row when looping around the result set.

Also, is there any reason why you need to process this full set of data in one go? Can you not break it up in to smaller pieces that won't put as much strain on resources?

-Stuart

-- 
Stuart Dallas
3ft9 Ltd
http://3ft9.com/

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php






[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux