array_diff odities

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Howdy folks.  I'm running into something strange with array_diff that
I'm hoping someone can shed some light on.

I have two tab-delimited text files, and need to find the lines in the
first that are not in the second, and vice-versa.

There are 794 records in the first, and 724 in the second.  

Simple enough, I thought.  The following code should work:

$tmpOriginalGradList = file('/path/to/graduate_list_original.txt');
$tmpNewGradList = file('/path/to/graduate_list_new.txt');

$diff1 = array_diff($tmpOriginalGradList, $tmpNewGradList);
$diff2 = array_diff($tmpNewGradList, $tmpOriginalGradList);

I expected that this would set $diff1 to have all elements of
$tmpOriginalGradList that did not exist in $tmpNewGradList, but it
actually contains many elements that exist in both.

The same is true for $diff2, in that many of its elements exist in both
$tmpOriginalGradList and $tmpNewGradList as well.

Since this returns $diff1 as having 253 elements and $diff2 as having
183, it sort of makes sense, since the difference between those two
numbers is 70, which is the difference between the number of lines in
the two files.  But the bottom line is that both $diff1 and $diff2
contain elements common to both files, which using array_diff simply
should not be the case.

However, when I loop through each file and strip out all the tabs:

foreach ($tmpOriginalGradList as $k=>$l) {
	$tmp = str_replace(chr(9), '', $l);
	$tmpOriginalGradList[$k] = $tmp;
}

foreach ($tmpNewGradList as $k=>$l) {
	$tmp = str_replace(chr(9), '', $l);
	$tmpNewGradList[$k] = $tmp;
}

I get $diff1 as having 75 elements and $diff2 as having 5, which also
sort of makes sense since there numerically there are 70 lines
difference between the two files.

I also manually replaced the tabs and checked about 20 of the elements
in $diff1 and none were found in the new text file, and none of the 5
elements in $diff2 were found in the original text file.

However, if in the code above I replace the tabs with a space instead of
just stripping them out, then the numbers are again 253 and 183.

I'm inclined to think the second set of results is accurate, since I was
unable to find any of the 20 elements I tested in $diff1 in the new text
file, and none of the elements in $diff2 are in the original text file.

Does anyone have any idea why this is happening?  The tab-delimited
files were generated from Excel spreadsheets using the same script, so
there wouldn't be any difference in the formatting of the files.

This one has me confused as I really thought the simple code posted
above should have worked.

If anyone can pass along any advice I would greatly appreciate it.

Cheers and TIA,

Pablo

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux