On 7/7/23 11:42 AM, home user wrote:
When I try to verify a back-up, I use "diff -r". The directory trees being compared contain about 870 files (mostly binary, like PNG, JPG, and so on), and take up about 707 megabytes. The trees being compared are on the hard drive and on a USB-3 stick. When I run the "diff -r" command, it seems to finish too quickly - it seems like less than a half of a second. I saw similar results a few weeks ago comparing about 30 gigabyte trees on the hard drive vs. on a USB-3.1 stick; the results were practically instantaneous. Is diff actually checking every bit (or byte), or is it using some "short cut"?
Before opening this thread, I had already spent a lot of time and effort verifying that "diff" worked correctly on binary files. The issue was that diff seemed to compare large directory trees of files too quickly, which led me to believe it was using a "short cut" rather than actually comparing file contents. I believe that some short cuts should be used: * files of different sizes should be reported as being different without comparing contents. * once one bit is found to differ between two files, they should reported as different without comparing the remaining contents. But contents should be compared even if two files have the same name, sizes, creation/modification histories, permissions, and other meta-data values. This was not happening. Ron's tests showed that my suspicion was correct: an inappropriate (in my opinion) short cut was being used. So, Roberto, short cuts are sometimes used. Ron also provided the solution, doing as root: sync ; echo 3 > /proc/sys/vm/drop_caches Patrick's point that diff wasn't meant for binary files is correct, but without a recursive option, cmp doesn't really help unless I want to write a script to do the recursive traversal of the 2 trees, calling cmp on every file that's in both trees. I struggle with recursion; trying it makes me curse and re-curse and curse yet more. Patrick's suggestion to use rsync is a good one. Robert's suggestion to use the "-c" option is also good. But wikipedia claims that checksums are not perfect, that it is remotely possible for files with identical checksums to differ. Years ago, when I worked on the AWIPS program at the National Weather Service, I needed a file restored from the regular back-up done by the sys.admins.. They couldn't do it. That taught me the importance checking back-ups. George's early June comments (in a different thread) about USB sticks taught me the importance of back-up checks being deep, at least occasionally. I've tagged this thread SOLVED. "rsync --dry-run -c" seems to be a good solution in many cases, but "diff -r" is better when a truly deep check is preferred. I thank everyone for their contributions. _______________________________________________ users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue