EUREKA! > -----Original Message----- > From: Stuart Dallas [mailto:stuart@xxxxxxxx] > Sent: Tuesday, September 03, 2013 6:31 AM > To: Daevid Vincent > Cc: php-general@xxxxxxxxxxxxx > Subject: Re: refernces, arrays, and why does it take up so much > memory? > > On 3 Sep 2013, at 02:30, Daevid Vincent <daevid@xxxxxxxxxx> wrote: > > > I'm confused on how a reference works I think. > > > > I have a DB result set in an array I'm looping over. All I simply want to > do > > is make the array key the "id" of the result set row. > > > > This is the basic gist of it: > > > > private function _normalize_result_set() > > { > > foreach($this->tmp_results as $k => $v) > > { > > $id = $v['id']; > > $new_tmp_results[$id] =& $v; //2013-08-29 [dv] using a > > reference here cuts the memory usage in half! > > You are assigning a reference to $v. In the next iteration of the loop, $v > will be pointing at the next item in the array, as will the reference you're > storing here. With this code I'd expect $new_tmp_results to be an array > where the keys (i.e. the IDs) are correct, but the data in each item matches > the data in the last item from the original array, which appears to be what > you describe. > > > unset($this->tmp_results[$k]); > > Doing this for every loop is likely very inefficient. I don't know how the > inner workings of PHP process something like this, but I wouldn't be > surprised if it's allocating a new chunk of memory for a version of the > array without this element. You may find it better to not unset anything > until the loop has finished, at which point you can just unset($this- > >tmp_results). > > > > > /* > > if ($i++ % 1000 == 0) > > { > > gc_enable(); // Enable Garbage Collector > > var_dump(gc_enabled()); // true > > var_dump(gc_collect_cycles()); // # of elements > > cleaned up > > gc_disable(); // Disable Garbage Collector > > } > > */ > > } > > $this->tmp_results = $new_tmp_results; > > //var_dump($this->tmp_results); exit; > > unset($new_tmp_results); > > } > > > Try this: > > private function _normalize_result_set() > { > // Initialise the temporary variable. > $new_tmp_results = array(); > > // Loop around just the keys in the array. > foreach (array_keys($this->tmp_results) as $k) > { > // Store the item in the temporary array with the ID as the key. > // Note no pointless variable for the ID, and no use of &! > $new_tmp_results[$this->tmp_results[$k]['id']] = $this->tmp_results[$k]; > } > > // Assign the temporary variable to the original variable. > $this->tmp_results = $new_tmp_results; > } > > I'd appreciate it if you could plug this in and see what your memory usage > reports say. In most cases, trying to control the garbage collection through > the use of references is the worst way to go about optimising your code. In > my code above I'm relying on PHPs copy-on-write feature where data is only > duplicated when assigned if it changes. No unsets, just using scope to mark > a variable as able to be cleaned up. > > Where is this result set coming from? You'd save yourself a lot of > memory/time by putting the data in to this format when you read it from the > source. For example, if reading it from MySQL, $this- > >tmp_results[$row['id']] = $row when looping around the result set. > > Also, is there any reason why you need to process this full set of data in > one go? Can you not break it up in to smaller pieces that won't put as much > strain on resources? > > -Stuart There were reasons I had the $id -- I only showed the relevant parts of the code for sake of not overly complicating what I was trying to illustrate. There is other processing that had to be done too in the loop and that is also what I illustrated. Here is your version effectively: private function _normalize_result_set() //Stuart { if (!$this->tmp_results || count($this->tmp_results) < 1) return; $new_tmp_results = array(); // Loop around just the keys in the array. $D_start_mem_usage = memory_get_usage(); foreach (array_keys($this->tmp_results) as $k) { /* if ($this->tmp_results[$k]['genres']) { // rip through each scene's `genres` and store them as an array since we'll need'em later too $g = explode('|', $this->tmp_results[$k]['genres']); array_pop($g); // there is an extra '' element due to the final | character. :-\ $this->tmp_results[$k]['g'] = $g; } */ // Store the item in the temporary array with the ID as the key. // Note no pointless variable for the ID, and no use of &! $new_tmp_results[$this->tmp_results[$k]['id']] = $this->tmp_results[$k]; } // Assign the temporary variable to the original variable. $this->tmp_results = $new_tmp_results; echo "\nMEMORY USED FOR STUART's version: ".number_format(memory_get_usage() - $D_start_mem_usage)." PEAK: (".number_format(memory_get_peak_usage(true)).")<br>\n"; var_dump($this->tmp_results); exit(); } MEMORY USED FOR STUART's version: -128 PEAK: (90,439,680) With the processing in the genres block MEMORY USED FOR STUART's version: 97,264,368 PEAK: (187,695,104) So a slight improvement from the original of -28,573,696 MEMORY USED FOR _normalize_result_set(): 97,264,912 PEAK: (216,268,800) No matter what I tried however it seems that frustratingly just the simple act of adding a new hash to the array is causing a significant memory jump. That really blows! Therefore my solution was to not store the $g as ['g'] -- which would seem to be the more efficient way of doing this once and re-use the array over and over, but instead I am forced to inline rip through and explode() in three different places of my code. We get over 30,000 hits per second, and even with lots of caching, 216MB vs 70-96MB is significant and the speed hit is only about 1.5 seconds more per page. Here are three distinctly different example pages that exercise different parts of the code path: PAGE RENDERED IN 7.0466279983521 SECONDS MEMORY USED @START: 262,144 - @END: 26,738,688 = 26,476,544 BYTES MEMORY PEAK USAGE: 69,730,304 BYTES PAGE RENDERED IN 6.9327299594879 SECONDS MEMORY USED @START: 262,144 - @END: 53,739,520 = 53,477,376 BYTES MEMORY PEAK USAGE: 79,167,488 BYTES PAGE RENDERED IN 7.558168888092 SECONDS MEMORY USED @START: 262,144 - @END: 50,855,936 = 50,593,792 BYTES MEMORY PEAK USAGE: 96,206,848 BYTES Furthermore I investigated what Jim Giner suggested and it turns out there was a way for me to wedge into our Connection class a way to mangle the results at that point, which is actually a more elegant solution overall as we can re-use that in many more places going forward. /** * Execute a database SQL query and return all the results in an associative array * * @access public * @return array or false * @param string $sql the SQL code to execute * @param boolean $print (false) Print a color coded version of the query. * @param boolean $get_first (false) return the first element only. useful for when 1 row is returned such as "LIMIT 1" * @param string $key (null) if a column name, such as 'id' is used here, then that column will be used as the array key * @author Daevid Vincent [daevid@xxxxxxxx] * @date 2013-09-03 * @see get_instance(), execute(), fetch_query_pair() */ public function fetch_query($sql = "", $print = false, $get_first=false, $key=null) { //$D_start_mem_usage = memory_get_usage(); if (!$this->execute($sql, $print)) return false; $tmp = array(); if (is_null($key)) while($arr = $this->fetch_array(MYSQL_ASSOC)) $tmp[] = $arr; else while($arr = $this->fetch_array(MYSQL_ASSOC)) $tmp[$arr[$key]] = $arr; $this->free_result(); // freeing result from memory //echo "\nMEMORY USED FOR fetch_query(): ".number_format(memory_get_usage() - $D_start_mem_usage)." PEAK: (".number_format(memory_get_peak_usage(true)).")<br>\n"; return (($get_first) ? array_shift($tmp) : $tmp); } -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php