Re: question: does "diff" use short cuts? [SOLVED]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/7/23 11:42 AM, home user wrote:
When I try to verify a back-up, I use "diff -r".  The directory trees being compared contain about 870 files (mostly binary, like PNG, JPG, and so on), and take up about 707 megabytes.  The trees being compared are on the hard drive and on a USB-3 stick.  When I run the "diff -r" command, it seems to finish too quickly - it seems like less than a half of a second.  I saw similar results a few weeks ago comparing about 30 gigabyte trees on the hard drive vs. on a USB-3.1 stick; the results were practically instantaneous.  Is diff actually checking every bit (or byte), or is it using some "short cut"?

Before opening this thread, I had already spent a lot of time and effort verifying that "diff" worked correctly on binary files.  The issue was that diff seemed to compare large directory trees of files too quickly, which led me to believe it was using a "short cut" rather than actually comparing file contents.  I believe that some short cuts should be used:
* files of different sizes should be reported as being different without comparing contents.
* once one bit is found to differ between two files, they should reported as different without comparing the remaining contents.
But contents should be compared even if two files have the same name, sizes, creation/modification histories, permissions, and other meta-data values.  This was not happening.

Ron's tests showed that my suspicion was correct: an inappropriate (in my opinion) short cut was being used.  So, Roberto, short cuts are sometimes used.  Ron also provided the solution, doing as root:
      sync ; echo 3 > /proc/sys/vm/drop_caches

Patrick's point that diff wasn't meant for binary files is correct, but without a recursive option, cmp doesn't really help unless I want to write a script to do the recursive traversal of the 2 trees, calling cmp on every file that's in both trees.  I struggle with recursion; trying it makes me curse and re-curse and curse yet more.  Patrick's suggestion to use rsync is a good one.  Robert's suggestion to use the "-c" option is also good.  But wikipedia claims that checksums are not perfect, that it is remotely possible for files with identical checksums to differ.

Years ago, when I worked on the AWIPS program at the National Weather Service, I needed a file restored from the regular back-up done by the sys.admins..  They couldn't do it.  That taught me the importance checking back-ups.  George's early June comments (in a different thread) about USB sticks taught me the importance of back-up checks being deep, at least occasionally.

I've tagged this thread SOLVED.  "rsync --dry-run -c" seems to be a good solution in many cases, but "diff -r" is better when a truly deep check is preferred.  I thank everyone for their contributions.
_______________________________________________
users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue



[Index of Archives]     [Older Fedora Users]     [Fedora Announce]     [Fedora Package Announce]     [EPEL Announce]     [EPEL Devel]     [Fedora Magazine]     [Fedora Summer Coding]     [Fedora Laptop]     [Fedora Cloud]     [Fedora Advisory Board]     [Fedora Education]     [Fedora Security]     [Fedora Scitech]     [Fedora Robotics]     [Fedora Infrastructure]     [Fedora Websites]     [Anaconda Devel]     [Fedora Devel Java]     [Fedora Desktop]     [Fedora Fonts]     [Fedora Marketing]     [Fedora Management Tools]     [Fedora Mentors]     [Fedora Package Review]     [Fedora R Devel]     [Fedora PHP Devel]     [Kickstart]     [Fedora Music]     [Fedora Packaging]     [Fedora SELinux]     [Fedora Legal]     [Fedora Kernel]     [Fedora OCaml]     [Coolkey]     [Virtualization Tools]     [ET Management Tools]     [Yum Users]     [Yosemite News]     [Gnome Users]     [KDE Users]     [Fedora Art]     [Fedora Docs]     [Fedora Sparc]     [Libvirt Users]     [Fedora ARM]

  Powered by Linux