To: "PHP" <php-general@xxxxxxxxxxxxx>
Sent: Wednesday, May 18, 2005 11:03 PM
Subject: array_diff odities
Howdy folks. I'm running into something strange with array_diff that I'm hoping someone can shed some light on.
I have two tab-delimited text files, and need to find the lines in the first that are not in the second, and vice-versa.
There are 794 records in the first, and 724 in the second.
Simple enough, I thought. The following code should work:
$tmpOriginalGradList = file('/path/to/graduate_list_original.txt'); $tmpNewGradList = file('/path/to/graduate_list_new.txt');
$diff1 = array_diff($tmpOriginalGradList, $tmpNewGradList); $diff2 = array_diff($tmpNewGradList, $tmpOriginalGradList);
I expected that this would set $diff1 to have all elements of $tmpOriginalGradList that did not exist in $tmpNewGradList, but it actually contains many elements that exist in both.
The same is true for $diff2, in that many of its elements exist in both $tmpOriginalGradList and $tmpNewGradList as well.
Since this returns $diff1 as having 253 elements and $diff2 as having 183, it sort of makes sense, since the difference between those two numbers is 70, which is the difference between the number of lines in the two files. But the bottom line is that both $diff1 and $diff2 contain elements common to both files, which using array_diff simply should not be the case.
However, when I loop through each file and strip out all the tabs:
foreach ($tmpOriginalGradList as $k=>$l) { $tmp = str_replace(chr(9), '', $l); $tmpOriginalGradList[$k] = $tmp; }
foreach ($tmpNewGradList as $k=>$l) { $tmp = str_replace(chr(9), '', $l); $tmpNewGradList[$k] = $tmp; }
I get $diff1 as having 75 elements and $diff2 as having 5, which also sort of makes sense since there numerically there are 70 lines difference between the two files.
I also manually replaced the tabs and checked about 20 of the elements in $diff1 and none were found in the new text file, and none of the 5 elements in $diff2 were found in the original text file.
However, if in the code above I replace the tabs with a space instead of just stripping them out, then the numbers are again 253 and 183.
I'm inclined to think the second set of results is accurate, since I was unable to find any of the 20 elements I tested in $diff1 in the new text file, and none of the elements in $diff2 are in the original text file.
Does anyone have any idea why this is happening? The tab-delimited files were generated from Excel spreadsheets using the same script, so there wouldn't be any difference in the formatting of the files.
This one has me confused as I really thought the simple code posted above should have worked.
If anyone can pass along any advice I would greatly appreciate it.
Cheers and TIA,
Pablo
-- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
-- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php