RE: refernces, arrays, and why does it take up so much memory? [SOLVED]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



EUREKA!

> -----Original Message-----
> From: Stuart Dallas [mailto:stuart@xxxxxxxx]
> Sent: Tuesday, September 03, 2013 6:31 AM
> To: Daevid Vincent
> Cc: php-general@xxxxxxxxxxxxx
> Subject: Re:  refernces, arrays, and why does it take up so much
> memory?
> 
> On 3 Sep 2013, at 02:30, Daevid Vincent <daevid@xxxxxxxxxx> wrote:
> 
> > I'm confused on how a reference works I think.
> >
> > I have a DB result set in an array I'm looping over. All I simply want
to
> do
> > is make the array key the "id" of the result set row.
> >
> > This is the basic gist of it:
> >
> >       private function _normalize_result_set()
> >       {
> >              foreach($this->tmp_results as $k => $v)
> >              {
> >                     $id = $v['id'];
> >                     $new_tmp_results[$id] =& $v; //2013-08-29 [dv] using
a
> > reference here cuts the memory usage in half!
> 
> You are assigning a reference to $v. In the next iteration of the loop, $v
> will be pointing at the next item in the array, as will the reference
you're
> storing here. With this code I'd expect $new_tmp_results to be an array
> where the keys (i.e. the IDs) are correct, but the data in each item
matches
> the data in the last item from the original array, which appears to be
what
> you describe.
> 
> >                     unset($this->tmp_results[$k]);
> 
> Doing this for every loop is likely very inefficient. I don't know how the
> inner workings of PHP process something like this, but I wouldn't be
> surprised if it's allocating a new chunk of memory for a version of the
> array without this element. You may find it better to not unset anything
> until the loop has finished, at which point you can just unset($this-
> >tmp_results).
> 
> >
> >                     /*
> >                     if ($i++ % 1000 == 0)
> >                     {
> >                           gc_enable(); // Enable Garbage Collector
> >                           var_dump(gc_enabled()); // true
> >                           var_dump(gc_collect_cycles()); // # of
elements
> > cleaned up
> >                           gc_disable(); // Disable Garbage Collector
> >                     }
> >                     */
> >              }
> >              $this->tmp_results = $new_tmp_results;
> >              //var_dump($this->tmp_results); exit;
> >              unset($new_tmp_results);
> >       }
> 
> 
> Try this:
> 
> private function _normalize_result_set()
> {
>   // Initialise the temporary variable.
>   $new_tmp_results = array();
> 
>   // Loop around just the keys in the array.
>   foreach (array_keys($this->tmp_results) as $k)
>   {
>     // Store the item in the temporary array with the ID as the key.
>     // Note no pointless variable for the ID, and no use of &!
>     $new_tmp_results[$this->tmp_results[$k]['id']] =
$this->tmp_results[$k];
>   }
> 
>   // Assign the temporary variable to the original variable.
>   $this->tmp_results = $new_tmp_results;
> }
> 
> I'd appreciate it if you could plug this in and see what your memory usage
> reports say. In most cases, trying to control the garbage collection
through
> the use of references is the worst way to go about optimising your code.
In
> my code above I'm relying on PHPs copy-on-write feature where data is only
> duplicated when assigned if it changes. No unsets, just using scope to
mark
> a variable as able to be cleaned up.
> 
> Where is this result set coming from? You'd save yourself a lot of
> memory/time by putting the data in to this format when you read it from
the
> source. For example, if reading it from MySQL, $this-
> >tmp_results[$row['id']] = $row when looping around the result set.
> 
> Also, is there any reason why you need to process this full set of data in
> one go? Can you not break it up in to smaller pieces that won't put as
much
> strain on resources?
> 
> -Stuart

There were reasons I had the $id -- I only showed the relevant parts of the
code for sake of not overly complicating what I was trying to illustrate.
There is other processing that had to be done too in the loop and that is
also what I illustrated.

Here is your version effectively:

	private function _normalize_result_set() //Stuart
	{
		  if (!$this->tmp_results || count($this->tmp_results) < 1)
return;

		  $new_tmp_results = array();

		  // Loop around just the keys in the array.
		  $D_start_mem_usage = memory_get_usage();
		  foreach (array_keys($this->tmp_results) as $k)
		  {
			/*
		  	if ($this->tmp_results[$k]['genres'])
			{
				// rip through each scene's `genres` and
store them as an array since we'll need'em later too
				$g = explode('|',
$this->tmp_results[$k]['genres']);
				array_pop($g); // there is an extra ''
element due to the final | character. :-\
				$this->tmp_results[$k]['g'] = $g;
			}
			*/

		  	// Store the item in the temporary array with the ID
as the key.
		    // Note no pointless variable for the ID, and no use of
&!
		    $new_tmp_results[$this->tmp_results[$k]['id']] =
$this->tmp_results[$k];
		  }

		  // Assign the temporary variable to the original variable.
		  $this->tmp_results = $new_tmp_results;
		  echo "\nMEMORY USED FOR STUART's version:
".number_format(memory_get_usage() - $D_start_mem_usage)." PEAK:
(".number_format(memory_get_peak_usage(true)).")<br>\n";
		  var_dump($this->tmp_results);
		  exit();
	}

MEMORY USED FOR STUART's version: -128 PEAK: (90,439,680)

With the processing in the genres block
MEMORY USED FOR STUART's version: 97,264,368 PEAK: (187,695,104)

So a slight improvement from the original of -28,573,696
MEMORY USED FOR _normalize_result_set(): 97,264,912 PEAK: (216,268,800)


No matter what I tried however it seems that frustratingly just the simple
act of adding a new hash to the array is causing a significant memory jump.
That really blows! Therefore my solution was to not store the $g as ['g'] --
which would seem to be the more efficient way of doing this once and re-use
the array over and over, but instead I am forced to inline rip through and
explode() in three different places of my code. 

We get over 30,000 hits per second, and even with lots of caching, 216MB vs
70-96MB is significant and the speed hit is only about 1.5 seconds more per
page.

Here are three distinctly different example pages that exercise different
parts of the code path:

PAGE RENDERED IN 7.0466279983521 SECONDS
MEMORY USED @START: 262,144 - @END: 26,738,688 = 26,476,544 BYTES
MEMORY PEAK USAGE: 69,730,304 BYTES

PAGE RENDERED IN 6.9327299594879 SECONDS
MEMORY USED @START: 262,144 - @END: 53,739,520 = 53,477,376 BYTES
MEMORY PEAK USAGE: 79,167,488 BYTES

PAGE RENDERED IN 7.558168888092 SECONDS
MEMORY USED @START: 262,144 - @END: 50,855,936 = 50,593,792 BYTES
MEMORY PEAK USAGE: 96,206,848 BYTES

Furthermore I investigated what Jim Giner suggested and it turns out there
was a way for me to wedge into our Connection class a way to mangle the
results at that point, which is actually a more elegant solution overall as
we can re-use that in many more places going forward.

	/**
	 * Execute a database SQL query and return all the results in an
associative array
	 *
	 * @access	public
	 * @return	array or false
	 * @param	string $sql the SQL code to execute
	 * @param 	boolean $print (false) Print a color coded version
of the query.
	 * @param	boolean $get_first (false) return the first element
only. useful for when 1 row is returned such as "LIMIT 1"
	 * @param	string $key (null) if a column name, such as 'id' is
used here, then that column will be used as the array key
	 * @author	Daevid Vincent [daevid@xxxxxxxx]
	 * @date      2013-09-03
	 * @see	get_instance(), execute(), fetch_query_pair()
	 */
	public function fetch_query($sql = "", $print = false,
$get_first=false, $key=null)
	{
		//$D_start_mem_usage = memory_get_usage();
		if (!$this->execute($sql, $print)) return false;

		$tmp = array();

		if (is_null($key))
			while($arr = $this->fetch_array(MYSQL_ASSOC)) $tmp[]
= $arr;
		else
			while($arr = $this->fetch_array(MYSQL_ASSOC))
$tmp[$arr[$key]] = $arr;

		$this->free_result(); // freeing result from memory
		//echo "\nMEMORY USED FOR fetch_query():
".number_format(memory_get_usage() - $D_start_mem_usage)." PEAK:
(".number_format(memory_get_peak_usage(true)).")<br>\n";
		return (($get_first) ? array_shift($tmp) : $tmp);
	}



-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php





[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux